PacBio HiFi sequencing technology continues to be the tool of choice for genomics professionals working at the forefront of discovery, enabling them to pursue new avenues of exploration across diverse domains of biology.
In this edition of our Powered by PacBio blog series, we highlight scientific papers from the month of October 2024. From improved resolution of complex genomic structural variation in pediatric sensorineural hearing loss, to helping support the creation of a new catalog of tandem repeats for large-scale analyses and population databases, these papers demonstrate how PacBio technology is enabling deeper insights across diverse areas of genomic research.
Jump to topic:
Rare and inherited disease | Tandem repeats + Complex regions | RNA | Human research
Rare and inherited disease
Long-Read Sequencing Increases Diagnostic Yield for Pediatric Sensorineural Hearing Loss
In this preprint, authors from Boston Children’s, Harvard, and PacBio find that HiFi sequencing “provided significantly improved resolution for complex structural variation and, in this cohort, substantially improved diagnostic yield over ES and srGS”. Diagnostic yield for pediatric hearing loss “has remained at around 40% for over a decade despite newly discovered causative genes and the expanded use of exome sequencing (ES). The study investigates PacBio “in the diagnostic evaluation of a small cohort of patients [5-10 yrs old] with SNHL of unknown etiology after ES and srGS.”
Key findings:
• The study explained 4 out of 19 cases (21%), including “a hemizygous deletion in trans with a missense variant in an area of high genomic homology … two single nucleotide loss-of-function variants in trans to a known copy-number-loss for a gene with a highly homologous pseudogene, and a “complex inversion.”
Conclusion:
PacBio “provides improved resolution for complex genomic structural variation which may increase diagnostic yield for genetic pediatric SNHL, and, potentially, rare disease more broadly.”
Tandem repeats + Complex regions
In this preprint, researchers from the Broad, Harvard, PacBio, Australia, U Miami, Baylor, U UT, UCSD, and U CO) present “a new, richly annotated [tandem repeat] catalog designed for large-scale analyses and population databases”. Calling TRs consistently is challenging, as it is “highly sensitive to input parameters” and “defining the starts and ends of TR loci in the reference genome is often subject to ambiguity”, leading to “issues with interpretation of TR genotypes” and “challenges of creating genome-wide TR catalogs suitable for population studies, especially multi-center studies employing different computational tools and sequencing technologies.”
In this preprint, researchers employed “both cohort-based and reference-based TR identification approaches to capture a comprehensive set of loci, including those that harbor common or rare variation”. They “Stratif[ied] TRs into two groups: isolated TRs suitable for repeat copy number analysis using short read or long read data and so-called variation clusters that contain TRs within wider polymorphic regions”, noting that ”Due to their complexity, variation clusters are best studied through sequence level analysis, particularly using long read data.”
Authors noted:
• [PacBio’s] “new method for profiling population-scale variation around TRs and other regions of the genome” provides “population allele frequencies, gene regions, variation cluster properties, and other annotations for TRs in the catalog that can be used to assist with interpretation of results or to filter the full catalog to a smaller subset that is most relevant to particular study objectives.”
• Their work presents “a novel algorithm that leverages long-read HiFi sequencing data to group repeats with surrounding polymorphisms. We show that the human genome contains at least 25,000 complex variation clusters, most of which span over 120 bp and contain five or more TRs”. “Resolving the sequence of entire variation clusters instead of individually genotyping constituent TRs leads to a more accurate analysis of these regions and enables us to profile variation that would have been missed otherwise.”
Conclusion:
With high-quality, PCR-free WGS of tandem repeats with comprehensive methylation profiling on the Revio system, HiFi reads provide precise sequence-level analysis of tandem repeat clusters, capturing variation that would otherwise be missed by short reads. The result is a more accurate and unified TR catalog that accelerates research timelines, reduces costs, and enhances the compatibility and reuse of genomic data across studies—ultimately advancing effective and personalized genomics research.
RNA
A proteogenomic atlas of the human neural retina
In this article, researchers from Radboud U & Maastricht U Netherlands, Ghent U Belgium, and Italy created “a proteogenomic atlas that combines PacBio long-read RNA-sequencing data with mass spectrometry and whole genome sequencing data of three healthy human neural retina samples”.
Key takeaways:
• “more than 10% of genetic variants linked to inherited retinal diseases (IRDs) alter splicing”
• The study “identified nearly 60,000 transcript isoforms, of which approximately one-third are novel”
• “For 35 out of 294 RetNet genes, the transcript with the novel ORF demonstrated higher expression than all other transcripts from the same gene containing a known ORF.”
• Authors note that this study “highlights the need to study tissue-specific transcriptomes in more detail for better understanding of tissue-specific regulation and for finding disease-causing variants.” and provides a “reference for future retina and IRD research to contribute to missing heritability in IRD patients.”
Conclusion:
The high-precision Iso-Seq workflow – now scalable and cost-effective with Kinnex library prep kits and sequencing with Revio – provides exceptional precision in capturing full-length transcripts, offering critical insights into disease pathogenesis at a cost-effective scale. By revealing transcript diversity that traditional methods miss, it enables more accurate findings that can contribute to the development of targeted therapies, reducing both research and healthcare costs while advancing rare and complex disease understanding.
Human research
The Platinum Pedigree: A long-read benchmark for genetic variants
In this preprint, researchers from PacBio, Google, NIST, UW, U TN, U UT, and Terry Fox Lab added to previous work, creating a more comprehensive variant calling truth set, using Mendelian inheritance in a large (two parents and eight siblings) pedigree. “This work adds ~200 Mb of high-confidence regions, including 8% more small variants, and introduces the first tandem repeat and structural variant truth sets for NA12878”, and “retrained DeepVariant using this data to reduce genotyping errors by ~34%.”
Conclusion:
HiFi WGS and scaling with Revio enables the availability of larger and more diverse long-read datasets, allowing for building of catalogs and tools to establish, for the first time, the complete and true extent of genetic variation across human populations, cohorts, and cancer.
Ready to kickstart breakthroughs of your own?
These recent studies highlight just how versatile and powerful PacBio sequencing can be, from diving into the complexities of single-cell RNA sequencing to mapping previously hard-to-see segment duplications. PacBio technology continues to open new doors for scientific discovery.
And now, with flexible financing options and partnerships with certified providers, research teams of any size can access PacBio data for their next project. Learn how to incorporate PacBio data into your next project: