In recent years, whole exome sequencing (WES) has become a popular method for identifying the genetic underpinnings of disease in both research and clinical settings. However, because an exome is confined to only protein-coding regions (or about 1.5% to 2% of the genome), it is unable to account for variants that fall outside this narrow subset of genetic information1. Whole genome analysis based on traditional short-read sequencing technology can expand the search area somewhat, but short-read genomes often contain numerous errors and gaps of missing information2,3. Together, these limitations present researchers and clinicians with a problem: how do we account for the full range of human genetic variation and use that knowledge to identify and characterize genetic diseases more effectively? The answer: HiFi whole genome sequencing on the new Revio system.
Rethink what can be accomplished with a whole genome using PacBio long-read sequencing
Because more than 80% of the protein-coding regions of the human genome are ≤200 nucleotide base pairs (bp) long4, exome analysis is well-suited to short-read instruments which sequence DNA in similarly sized snippets ranging from 100 bp to 300 bp in length. However, it is becoming increasingly clear that a wealth of genomic variation, including disease causing variants, fall outside of exons in ways that are difficult or impossible to detect with short-read sequencers. To address this, some laboratories have begun to include short-read genome sequencing as a fall back or “reflex” option when exome analysis is unable to explain a disease phenotype. However, some have begun to call this approach into question because of a growing body of literature showing that the clinical utility of short-read genomes and exomes are not significantly different.5,6 In a climate of seemingly limited options for future discovery with short reads, the expanded throughput and highly accurate long reads of the Revio system may begin to turn the calculus of whole genome vs. whole exome sequencing on its head, potentially upending the nearly decade long dominance of short reads in genomic medicine. This is especially true in rare disease research where the wealth of extra genetic context provided by long reads is helping explain cases that were unexplainable with short-read technologies7.
Genomic everything, everywhere, all at once
Not unlike the laundromat owner in the Academy Award-winning film who finds herself swept up into a multiverse of parallel realities, the Revio system presents genomics professionals with a whole new universe of analysis potential. This revolutionary sequencer allows researchers to see the things that can be seen with a conventional short-read exome or whole genome (SNPs, etc.) –and takes it even further by enabling access to more complete, 99.9% accurate sequencing coverage (including genomic dark regions) with superior structural variant calling, haplotype phasing, and epigenetic detection –all rolled into a single sequencing workflow. This more comprehensive suite of information enables researchers to see beyond the proverbial SNP tables that define the current short-read standard to build a richer understanding of how disease phenotypes are influenced by repeat expansions, copy number variants, allelic disruptions in-cis or trans, and epigenetic modifications. Retrieving this type of far-reaching genomic data is difficult-to-nearly-impossible with conventional short-read instruments and would instead require researchers to run a battery of additional experiments, each with its own associated materials and labor cost.
A growing ecosystem for a whole new genome
With the launch of the Revio system and the expansion of HiFi sequencing capabilities, PacBio is not only focusing on helping scientists access the most biologically rich genomic data possible but is also working hard to develop the tools and resources needed to make it easier for researchers to make impactful discoveries in areas of the genome that have been missed by previous generations of sequencers. In addition to the Revio system, PacBio has built comprehensive SNP + SV detection workflows and released new analysis tools including:
- TRGT — for tandem repeat genotyping and visualization (say goodbye to southern blots and PCR assays).
- Paraphase – a caller for highly homologous genes.
- HiFiCNV — a copy number variant caller and depth visualization utility.
These analysis packages ensure that researchers can extract more critical insights from their samples using software optimized for PacBio HiFi data. At the same time, in collaboration with an international coalition of experts, PacBio helped launch the Consortium of Long Read Sequencing (CoLoRs), a dedicated long-read sequencing database where genomes and variants that can only be meaningfully observed using long-read sequencing can be vetted, cataloged, and used as references to accelerate our understanding of human genomics, especially as it applies to human health.
This all sounds great but isn’t a PacBio whole genome expensive?
At about 1000 USD* per 30x human genome on the Revio system, accessing the full complement of genomic information to make a truly “whole” whole genome is now more affordable and information rich than ever. This is especially true when considering the time and money saved by skipping the reflex testing required to achieve similar insights with short-read genomes and exomes. Furthermore, the Revio system is optimized to produce one quality HiFi human whole genome per SMRT Cell with a ~24-hour turnaround time, which means you do not have to wait to “batch” samples to hit that cost target. A $1000 USD* HiFi whole genome is exactly that.
What about a long-read whole exome?
Long-read whole genomes generated with HiFi sequencing on the Revio system are made of highly accurate reads that are each between 15,000–20,000 bp long. At the same time, most exons in the human genome are ≤200 bp in length which means that the overwhelming majority (98.6%) of the information contained in a hypothetical HiFi long-read whole exome dataset would be unnecessary and of no utility in an exome analysis. Even a dedicated custom long-read exome panel comprised of 2000 bp reads would still be 90% off-target. Complicating matters even further, long-read whole exome sample preparation takes approximately 3 days compared to just 4 hours for a HiFi whole genome. The opportunity cost and labor expenses alone diminish the benefits of a long-read whole exome and make a HiFi whole genome a faster and more cost-conscious choice. At the same time, for specific cases where an investigator would like to leverage the advantages of HiFi sequencing in a targeted fashion, enrichment panels and PCR amplicon assays are available that deliver high-throughput and cost-effective long-read results for a variety of applications.
Get more for less with HiFi whole genome sequencing
The world of genomics is changing. Long reads, which many regarded as an immature technology, are now being hailed as the Method of Year. And with the Revio system, HiFi sequencing throughput now achieves 1,300+ human HiFi whole genomes per year with an information rich and “batch-free” cost-effectiveness that completely changes the game for laboratories around the world. In many ways, the ongoing development of PacBio third generation sequencing technology heralds the dawn of a new era in genomic discovery –where researchers striving toward the betterment of human health will finally have access to the genomic equivalent of everything, everywhere, all at once.
References
- Venter JC, The sequence of the human genome. Science. 2001 Feb 16;291(5507):1304-51. doi: 10.1126/science.1058040. Erratum in: Science 2001 Jun 5;292(5523):1838. PMID: 11181995. https://pubmed.ncbi.nlm.nih.gov/11181995/
- De Coster, W., Weissensteiner, M.H. & Sedlazeck, F.J. Towards population-scale long-read sequencing. Nat Rev Genet 22, 572–587 (2021). https://doi.org/10.1038/s41576-021-00367-3
- Ebbert, M.T.W., et al. (2019) Systematic analysis of dark and camouflaged genes reveals disease-relevant genes hiding in plain sight. Genome Biol., 20, 97
- Sakharkar MK, Chow VT, Kangueane P. Distributions of exons and introns in the human genome. In Silico Biol. 2004;4(4):387-93. PMID: 15217358.
- Clark, M.M., Stark, Z., Farnaes, L. et al. Meta-analysis of the diagnostic and clinical utility of genome and exome sequencing and chromosomal microarray in children with suspected genetic diseases. npj Genomic Med 3, 16 (2018). https://doi.org/10.1038/s41525-018-0053-8
- Kingsmore SF, Cakici JA, Clark MM et al. A Randomized, Controlled Trial of the Analytic and Diagnostic Performance of Singleton and Trio, Rapid Genome and Exome Sequencing in Ill Infants. Am J Hum Genet. 2019 Oct 3;105(4):719-733. doi: 10.1016/j.ajhg.2019.08.009. Epub 2019 Sep 26. PMID: 31564432; PMCID: PMC6817534.
- Cohen, Ana S.A. et al. Genomic answers for children: Dynamic analyses of >1000 pediatric rare disease genomes Genetic in Medicine Vol 24 Issue 6, 1336-1348 (2022) https://doi.org/10.1016/j.gim.2022.02.007