PacBio HiFi sequencing technology continues to be the tool of choice for genomics professionals working at the forefront of discovery, enabling them to pursue new avenues of exploration across diverse domains of biology.
In this edition of our Powered by PacBio blog series, we highlight scientific papers from the month of July 2024. From advancing bioinformatics tools for the detection of de novo tandem repeat mutations, to resolving the impact of mosaic variants through targeted sequencing, and benchmarking strategies for metagenomics, these papers demonstrate how PacBio technology is pushing the boundaries of what’s possible in genomic research.
Jump to topic:
Mosaic variants + targeted sequencing Metagenomics Bioinformatics Immunology
Mosaic variants + targeted sequencing
Resolving the chromatin impact of mosaic variants with targeted Fiber-seq –
In this preprint, authors find that “targeted Fiber-seq enables the production of targeted long-read sequencing chromatin maps to resolve the heterogeneity of genetic and chromatin architectures with single-molecule precision”
Findings included:
- “short-read methods are ill-suited for identifying neighboring regulatory elements that may be altered by a variant …, resulting in a myopic view of how mosaic variants impact gene regulatory architectures.”
- Cas9-based (~10-fold) targeted enrichment of regions of interest, followed by Fiber-Seq “enables the synchronous measurement of DNA sequence, CpG methylation, and chromatin accessibility along the same +10kb molecule of DNA” to “directly disentangle the functional impact of heterogeneously present genetic variants on neighboring gene regulatory programs.”
- New insights into Myotonic Dystrophy 1: “co-measurements of CpG methylation, chromatin accessibility, and single-molecule CTCF footprinting”: pathogenic expansions of the DMPK CTG repeat “are characterized by somatic instability and disruption of multiple nearby regulatory elements, both of which are repeat length-dependent.”
- Increased insights into therapeutic base editing, finding: “base editing is often incomplete, which results in a mixture of edited and unedited haplotypes within a population of cells”, and, “the long-reads obtained by Fiber-seq enable the unique mapping of CpG methylation, chromatin architectures, and ABE-induced base changes to each of the segmentally duplicated HBG1 and HBG2 genes”
Conclusion:
Fiber-Seq continues to transform our understanding of gene regulation.
Metagenomics
In this study, researchers out of Denmark and Sweden provide a detailed comparison of HiFi vs. ONT for full-length 18S sequencing on “a precisely defined synthetic eukaryotic mock community”:
Key findings:
- PacBio allows direct amplicon sequencing, while ONT requires UMIs. “PacBio CCS provided the best coverage of the mock community with sequences for 40 out of 42 taxa. In contrast, the Nanopore methods only provided coverage for 21 and 32 sequences for the direct and clonal datasets, respectively.”
- “Highest number of correct ASVs [amplicons sequence variants] from the mock community was … found using the PacBio CCS data where 36 out of 40 unique reference sequences found in the raw data could be identified … At the same time, only two false-positive ASVs were called”. “The number of ASVs was surprisingly low for the Nanopore UMI direct data” [7 out of 21; 24 out of 31 for clonal], “A few false-positive ASVs were also observed despite the use of UMIs.”
- Highlighting differences in workflow (“PacBio library preparation is quicker”), challenges around UMIs (including “stochastic biases introduced by UMI-tagging and the difficulties posed by PCR with lengthy UMI primers, leading to inconsistent results”, and “The recent development of the R10.4 chemistry … still prevents the use of ASV[amplicons sequence variants]-calling on these raw reads.”), and bfx burden/cost: “Since both the Sequel IIe and Revio deliver processed CCS directly, bioinformatics processing is substantially reduced. This advantage, combined with a much lower cost per consensus sequence, positions the PacBio platform as the current method of choice.”
Conclusion:
Accurate, high-throughput long amplicon sequencing and Kinnex 16S sequencing for comprehensive microbial community characterizations outperform ONT in all aspects for targeted rRNA sequencing.
Bioinformatics
PacBio bioinformatics tools continue to expand, enabling more discoveries
TRGT-denovo: accurate detection of de novo tandem repeat mutations
In this preprint, researchers from PacBio, U UT, Radboud Netherlands, Mt. Sinai, Genomics England UK found that an expansion of TRGT identifies “all types of de novo TR mutations (including expansions, contractions, and compositional changes) within family trios”, allows detection of subtle variations often overlooked in vcf files, and “improves precision and specificity of de novo mutation (DNM) identification, reducing the number of de novo candidates by an order of magnitude compared to genotype-based approaches”
Conclusion:
Through solutions like HiFi WGS and PureTarget, PacBio technology is the only known tool available to comprehensively characterize tandem repeats – now including de novo tandem repeat mutations — the most common SV in the human genome.
Immunology:
In this preprint, researchers from JHU and Penn State conducted “first work to systematically analyze and report non-canonical recombination events in long-read datasets and assemblies”
Key findings:
- Many high-quality assemblies are derived from lymphoblastoid cell lines (LCLs), however immunoglobulin loci in LCLs “contain a mixture of germline and somatically recombined haplotypes.”
- IGLoo “profiles the somatic V(D)J recombination events in an LCL genome and measures their clonality. Then it improves the germline assembly of the IGH locus by removing the reads representing somatic haplotypes driven by V(D)J recombination and reassembling the dataset”.
- Applied to 47 HPRC samples: “On average, IGLoo reassembled IGH locus covered 10.8 more IGH genes per individual than HPRC year-1 assemblies”. In addition, the preprint goes on to say “we also found evidence of a range of known and novel non-canonical V(D)J recombination events”.
Conclusion:
Bioinformatics tools continue to improve, further leveraging the inherent accuracy and length of HiFi reads. Both are required to resolve complex IG regions and improvements to these tools now allow researchers to differentiate germline structure vs. somatic rearrangements in lymphoblastoid cell lines, a widely used source of human cells used in major consortium projects such as HPRC, HGSVC, 1KGP, HapMap, and GIAB. Improved tools also better resolve IG loci in other vertebrate species, important for comparative analyses.
Ready to kickstart breakthroughs of your own?
These recent publications exemplify the versatility and power of PacBio sequencing. From tackling challenges associated with single-cell RNA-seq analyses to revealing segment duplications, PacBio technology is enabling scientific pioneers to make transformative breakthroughs like never before.
PacBio sequencing is now more accessible for research teams of all sizes –thanks to new options for instrument financing or collaboration with certified service providers. To learn how to incorporate PacBio data into your next project: