More precise data provided by PacBio HiFi sequencing can have a marked impact on the resolving power in genetic association studies and decrease the required cohort sizes. As a recent example, a new preprint entitled Improved detection of evolutionary selection highlights potential bias from different sequencing strategies in complex genomic regions by researchers from Children’s Hospital of Philadelphia and collaborators, describes this for the case of detecting regions in the genome under balancing selection. The researchers measured association signals in 497 array- and NGS exome-based clinical datasets; 3,500 NGS-based HLA typing data from the IHIW database; and 23 high-quality PacBio HiFi genomes from the Human Pangenome Reference Consortium (HPRC). The genome-wide linkage disequilibrium (LD) analysis, as well as graphs focusing in on the MHC region, highlight the large difference in signal-to-noise, despite a more than ten-fold smaller cohort size for the PacBio dataset:
Critically, the PacBio HiFi cohort analysis identified several false signals from the NGS cohort analysis. The authors write: “Revealingly, with the African Pangenome samples, signals at SIRPB1 seen in the clinical samples were absent, indicating that they were likely artifacts of inaccurate [short-read] sequence mapping”. Another false positive signal from short-read NGS was observed in the MHC region: “A dramatic peak centered on intron 5 of [HLA]-DRB1 seen in the IHIW dataset was completely absent in the Pangenome analysis. This portion of DRB1 is known to have structural variation and repeat elements, hindering accurate mapping of shorter sequencing reads, and therefore likely causes artifactual LD in IHIW but not the Pangenome.”
The authors concluded:
“Our results demonstrated that orders of magnitude smaller set of high-quality long-read sequencing data has the potential to more effectively characterizing genetic variation than larger sets of sequencing data from other platforms.”
More precise sequence information afforded by PacBio HiFi sequencing also translates to an improved understanding of genetic diversity and what biomarker levels constitute ‘normal’ for different individuals.This is highlighted potently in a recent preprint entitled Genetically determining individualized clinical reference ranges for the biomarker tryptase can limit unnecessary procedures and unmask myeloid neoplasms, led by researchers from the National Institute of Allergy and Infectious Diseases (NIH), with collaborators from twenty eight participating institutions. PacBio sequencing was used to resolve a complex region in the human genome which encodes for an enzyme (serum tryptase), increased levels of which is a biomarker for certain myeloid neoplasms. The researchers observed that one of the genes in the locus, TPSAB1, is replicated in some individuals as part of a 15 kb tandem duplication.
In addition, a series of unique proximal non-coding variants were also identified that distinguished replicated from non-replicated sequences. An expanded DNA motif within the 5′-UTR was also linked to TPSAB1 replication-associated variants and demonstrated increased in vitro promoter activity relative to the paralogous region in the non-replicated promoter.
Using these new insights, the authors generated a new genetic-based model for reference ranges of the biomarker, based upon the gene’s replication number, thereby setting “new individualized clinical reference values for the upper limit of ‘normal’”. Thus, elevated biomarker levels, which would ordinarily have prompted bone marrow biopsies, are now understood to be “within ‘normal’ limits for certain individuals” with the replication, potentially avoiding the invasive and expensive procedure for these individuals.
These and other studies powerfully demonstrate how a more comprehensive view of individual genomes results in an improved understanding of the genetic basis of a person’s health or disease status, and thereby ultimately allowing for more informed treatments. This is the premise of precision medicine. By providing more accurate, more contiguous and more complete, i.e. more precise genetic information, PacBio HiFi sequencing is a critical enabling research technology to help fulfill this promise of genomics to better human health through precision medicine.
Read more about the HiFi difference
Not all gigabases are created equal
Haplotype phasing in genome assembly
True long reads vs. synthetic long reads
PacBio products are for research use only, and are not to be used in diagnostic procedures.