A new preprint entitled Utility of long-read sequencing for All of Us – from research groups at Baylor College of Medicine, the Broad Institute, Jackson Laboratory, Discovery Life Sciences, Harvard Medical School, Johns Hopkins University, Massachusetts General Hospital, the University of Washington, HudsonAlpha Institute for Biotechnology, and Rice University – proposes that the time is now to apply long-read sequencing in population-scale projects. The authors compare PacBio HiFi, ONT, and Illumina sequencing across a cohort of four HapMap samples and two All of Us (AoU) controls. The researchers observed “substantial differences in the ability of these technologies to accurately sequence complex medically relevant genes, particularly in terms of gene coverage and pathogenic variant identification.” As the main finding, the authors state that “HiFi reads produced the most accurate results for both small and large variants” and conclude that “long-reads have widespread value for establishing the most complete and accurate variant calls for All of Us and potentially for many other projects.”
The study reports that PacBio is best-in-class for calling variants even at lower coverage than the other technologies. In contrast, “Illumina-based samples had a much higher coverage (>30x coverage) but suffered from major inherent biases in SV detection.” The authors conclude that “simple comparisons of raw sequencing coverage or other simple metrics are not sufficient to evaluate the utility of a sequencing technology.” This has been reported in other studies and we have also highlighted this important consideration in a previous blog post.
Comparing the two long-read technologies, the researchers found that “while read length is frequently suggested as a dominant factor that may favor ONT, our results demonstrate that the benefits of read length are overshadowed by the higher sequencing accuracy of the HiFi technology.”
The long-read game for all of us
The study concludes with an important future perspective and recommendation. Highlighting that also in previous studies, and critically including “in the ACMG, which represents a crucial list of genes in the medical field, long-read sequencing demonstrated its efficiency in sequencing those genes and reporting variants more accurately compared to short reads,” the authors posit that “the question rises if we have entered the age of using long-reads exclusively.” Addressing throughput and cost questions, the authors comment that this study “as well as other projects demonstrate that long-read technologies are not far behind” relative to Illumina technology, and explicitly mention the “new high throughput Revio system … with capacity for thousands of genomes per year per instrument.” They conclude: “we believe that the long-read technologies are advancing rapidly in these directions so that All of Us and the genomics community at large can now confidently begin such large-scale initiatives.”
Observing that “this study shows the strong value of long-reads for simple and complex medically relevant genes and gives clear indications that long-reads are on par with if not better than the short-reads.”, the authors recommend that “All of Us and other population-scale projects should investigate the usage of long-reads at scale and how to utilize and understand the clinical relevance of the so-obtained novel alleles in the setting of larger short-read cohorts.”, and conclude that “we should continue developing population-scale cohorts sequenced with long reads only.”
We are very excited to partner with the scientific community on this paradigm shift towards large-scale population genomics initiatives that will deliver the highest quality of sequence data for the most accurate and comprehensive calling of all types of genetic variants. Please connect with us to discuss how your project can benefit from high-throughput PacBio HiFi sequencing on the Revio system, for long-lasting benefits to all of us.