Want to know what PacBio users have been up to? In this summary, get the latest on some of the hottest scientific publications to incorporate PacBio sequencing methods. Selections from the month of August 2023 include papers on human genomics, Mendelian genetics, and rare disease research.
Read the bullet points, take a look at the original pieces, and see how PacBio sequencing technology is enabling your peers around the world to make high impact discoveries that are advancing biology across disciplines!
Human genomics
A historic milestone for genomics was achieved in August 2023 with the publication of not one but two Nature papers illuminating Y chromosome biology and completing the last unfinished human chromosome.
Article: The complete sequence of a human Y chromosome
Telomere-to-Telomere (T2T) consortium scientists led by the United States’ National Institutes of Health (NIH) used PacBio HiFi sequencing to complete the final piece of the first complete human genome. Until now, the Y chromosome had been notoriously difficult to assemble due to its highly repetitive sequence.
Key takeaways:
- In explaining their motivations, the authors point out that “more than half of the Y chromosome is missing from the GRCh38 reference sequence and it remains the last human chromosome to be finished.”
- In the paper the “…[T2T] consortium presents the complete 62,460,029 base pair sequence of a human Y chromosome from the HG002 genome (T2T-Y) that corrects multiple errors in GRCh38-Y and adds over 30 million base pairs of sequence to the reference.”
- The authors conclude by stating that “we have combined T2T-Y with a prior assembly of the CHM13 genome and mapped available population variation, clinical variants, and functional genomics data to produce a complete and comprehensive reference sequence for all 24 human chromosomes.”
Article: Assembly of 43 human Y chromosomes reveals extensive complexity and variation
The Human Genome Structural Variation Consortium (HGSVC), led by researchers at the Jackson Laboratory performed a pangenome analysis on the human Y chromosome from a diverse set of ancestries. After selecting Verkko as their assembly tool of choice, the group combined HiFi sequencing with other long-read sequencing methods (including the PacBio Iso-Seq method) to make fundamental, paradigm-shifting discoveries.
Key takeaways:
- From their analysis, the authors point out that “the size of the Y chromosome assemblies varies extensively from 45.2 to 84.9 Mbp” – an almost 2-fold difference!
- They further demonstrate that “half of the male-specific euchromatic region is subject to large (up to 5.94 Mbp) inversions with a >2-fold higher recurrence rate compared to the rest of the human genome.”
- The authors “…have identified on average 65 Kb of novel sequence per Y chromosome.” “Examination of the full extent of genetic variation between Y chromosomes across 180,000 years of human evolution reveals its remarkable complexity and diversity in size and structure, in contrast with its low-level of base substitution variation.” This highlights that most of the genetic variation in this chromosome is structural in nature.
- The teams concludes that “the availability of sequence-resolved Y chromosomes from multiple samples provides a basis for identifying new associations of specific traits with the Y chromosome and garnering novel evolutionary insights.”
Mendelian genetics
Review: Applications of long-read sequencing to Mendelian genetics
In this review, University of Washington-Seattle researchers highlight the benefits of long-read sequencing for both read and assembly-based variant calling to increase the explanation rate of human genetic disorders.
Key takeaways:
- The Revio system along with single-cell and bulk MAS-Seq methods are highlighted with the authors stating that “the recent launch of Revio by PacBio, which promises a highly accurate sequence of a human genome for $1000 in materials per human sample, represents an important shift in strategy and will allow more researchers to access high-quality long-read whole genome sequencing (LR WGS) data.”
- The authors conclude that it is “tempting to speculate when LR WGS might emerge as a single test for clinical samples. Despite being more expensive than short-read whole genome sequencing (SR WGS), LR WGS advantages are clear: improved variant discovery (particularly for SVs), physical phasing of genomes, simultaneous discovery of methylation differences and genetic variants without additional experiments, and the ability to reanalyze a single dataset based on clinical suspicion. [LR WGS] is, in principle, the most comprehensive test currently available as it has the potential to fully sequence-resolve both maternal and paternal chromosomes of a patient. If de novo assemblies of patient genomes and their parents become routine, it fundamentally changes how variants are discovered. Instead of read-based discovery, parent-to-offspring comparison of fully resolved chromosomes can be made to discover genetic and epigenetic changes of both small and large effect. As the disadvantages, including, cost, throughput, and computational overhead are resolved, LRS will become a more attractive option to human genetics researchers and clinicians alike.”
Rare disease
Review: Beyond the exome: What’s next in diagnostic testing for Mendelian conditions
In this review article, GREGoR consortium (Genomics Research to Elucidate the Genetics of Rare diseases) scientists provide a framework for next-steps in exome-negative instances of genetic disease in humans. Published examples of long-read sequencing benefits (e.g., SV detection, phasing, native methylation calls, etc.) in such cases are reviewed, and the authors highlight opportunities for further development of data analysis methodologies.
Key takeaways:
- The authors point out that: “Limited population-level data exist for SVs, SNVs, and indels in regions refractory to analysis with short reads. Ongoing efforts such as those from the All of Us project and long-read genome sequencing (lrGS) of samples from the 1000 Genomes Project will address this limitation but will take several years to complete. In addition, as with short-read genome sequencing (srGS) data, there are few tools for interpretation of noncoding SNVs, indels, and SVs.”, and “frequently changing pipelines and limited reference datasets (especially from diverse populations) for filtering and prioritizing of variants identified by long-read sequencing creates challenges.”
- On the prospect of integrated multi-omics: “we envision a time when a single data source, such as srGS or lrGS, is evaluated in a stepwise fashion, perhaps enhanced by concurrent methylome or transcriptome, or metabolomic analysis that replaces the time-consuming progression of microarray, panel testing, and ES [exome sequencing].”
- The authors conclude: “In the near future, we anticipate that a single test, such as srGS, will be used to simplify the evaluation process and reduce inequities in care, with lrGS replacing or supplementing this data as costs fall over time”.
Ready to kickstart breakthroughs of your own?