Menu

Scientific publications

Publications featuring PacBio long-read + short-read sequencing data

bioRxiv  |  2025

CiFi: Accurate long-read chromatin conformation capture with low-input requirements

Sean P McGinty, Gulhan Kaya, Sheina B. Sim, Renée Lynn Corpuz, Michael A Quail, Mara KN Lawniczak, Scott M Geib, Jonas Korlach, Megan Y Dennis

By coupling chromatin conformation capture (3C) with PacBio HiFi long-read sequencing, we have developed a new method (CiFi) that enables analysis of genome interactions across repetitive genomic regions with low-input requirements. CiFi produces multiple interacting concatemer segments per read, facilitating genome assembly and scaffolding. Together, the approach enables genomic analysis of previously recalcitrant low-complexity loci, and of small organisms such as single insect individuals.
Nature  |  2025

A Near Complete Genome Assembly of the Oshima Cherry Cerasus speciosa

Kazumichi Fujiwara, Atsushi Toyoda, Bhim B. Biswa, Takushi Kishida, Momi Tsuruta, Yasukazu Nakamura, Noriko Kimura, Shoko Kawamoto, Yutaka Sato, Toshio Katsuki, Sakura 100 Genome Consortium & Tsuyoshi Koide

The Oshima cherry (Cerasus speciosa), which is endemic to Japan, has significant cultural and horticultural value. In this study, we present a near complete telomere-to-telomere genome assembly for C. speciosa, derived from the old growth “Sakurakkabu” tree on Izu Oshima Island. Using Illumina short-read, PacBio long-read, and Hi-C sequencing, we constructed a 269.3 Mbp genome assembly with a contig N50 of 32.0 Mbp. We examined the distribution of repetitive sequences in the assembled genome and identified regions that appeared to be centromeric. Detailed structural analysis of these putative centromeric regions revealed that the centromeric regions of C. speciosa comprised repetitive sequences with monomer lengths of 166 or 167 bp. Comparative genomic analysis with Prunus sensu lato genome revealed structural variations and conserved syntenic regions. This high-quality reference genome provides a crucial tool for studying the genetic diversity and evolutionary history of Cerasus species, facilitating advancements in horticultural research and the preservation of this iconic species.
Nature  |  2025

Seasonal recurrence and modular assembly of an Arctic pelagic marine microbiome

Taylor Priest, Ellen Oldenburg, Ovidiu Popa, Bledina Dede, Katja Metfies, Wilken-Jon von Appen, Sinhué Torres-Valdés, Christina Bienhold, Bernhard M. Fuchs, Rudolf Amann, Antje Boetius & Matthias Wietz

Deciphering how microbial communities are shaped by environmental variability is fundamental for understanding the structure and function of ocean ecosystems. While seasonal environmental gradients have been shown to structure the taxonomic dynamics of microbiomes over time, little is known about their impact on functional dynamics and the coupling between taxonomy and function. Here, we demonstrate annually recurrent, seasonal structuring of taxonomic and functional dynamics in a pelagic Arctic Ocean microbiome by combining autonomous samplers and in situ sensors with long-read metagenomics and SSU ribosomal metabarcoding. Specifically, we identified five temporal microbiome modules whose succession within each annual cycle represents a transition across different ecological states. For instance, Cand. Nitrosopumilus, Syndiniales, and the machinery to oxidise ammonia and reduce nitrite are signatures of early polar night, while late summer is characterised by Amylibacter and sulfur compound metabolism. Leveraging metatranscriptomes from Tara Oceans, we also demonstrate the consistency in functional dynamics across the wider Arctic Ocean during similar temporal periods. Furthermore, the structuring of genetic diversity within functions over time indicates that environmental selection pressure acts heterogeneously on microbiomes across seasons. By integrating taxonomic, functional and environmental information, our study provides fundamental insights into how microbiomes are structured under pronounced seasonal changes in understudied, yet rapidly changing polar marine ecosystems.
Nature  |  2025

Synchronized long-read genome, methylome, epigenome and transcriptome profiling resolve a Mendelian condition

Mitchell R. Vollger, Jonas Korlach, Kiara C. Eldred, Elliott Swanson, Jason G. Underwood, Stephanie C. Bohaczuk, Yizi Mao, Yong-Han H. Cheng, Jane Ranchalis, Elizabeth E. Blue, Ulrike Schwarze, Katherine M. Munson, Christopher T. Saunders, Aaron M. Wenger, Aimee Allworth, Sirisak Chanprasert, Brittney L. Duerden, Ian Glass, Martha Horike-Pyne, Michelle Kim, Kathleen A. Leppig, Ian J. McLaughlin, Jessica Ogawa, Elisabeth A. Rosenthal, University of Washington Center for Rare Disease Research, Undiagnosed Diseases Network, …Andrew B. Stergachis

Resolving the molecular basis of a Mendelian condition remains challenging owing to the diverse mechanisms by which genetic variants cause disease. To address this, we developed a synchronized long-read genome, methylome, epigenome and transcriptome sequencing approach, which enables accurate single-nucleotide, insertion–deletion and structural variant calling and diploid de novo genome assembly. This permits the simultaneous elucidation of haplotype-resolved CpG methylation, chromatin accessibility and full-length transcript information in a single long-read sequencing run. Application of this approach to an Undiagnosed Diseases Network participant with a chromosome X;13-balanced translocation of uncertain significance revealed that this translocation disrupted the functioning of four separate genes (NBEA, PDK3, MAB21L1 and RB1) previously associated with single-gene Mendelian conditions. Notably, the function of each gene was disrupted via a distinct mechanism that required integration of the four ‘omes’ to resolve. These included fusion transcript formation, enhancer adoption, transcriptional readthrough silencing and inappropriate X-chromosome inactivation of autosomal genes. Overall, this highlights the utility of synchronized long-read multi-omic profiling for mechanistically resolving complex phenotypes.
Nature  |  2025

Evaluating the efficiency of 16S-ITS-23S operon sequencing for species level resolution in microbial communities

Rapid advancements in long-read sequencing have facilitated species-level microbial profiling through full-length 16S rRNA sequencing (~ 1500 bp), and more notably, by the newer 16S-ITS-23S ribosomal RNA operon (RRN) sequencing (~ 4500 bp). RRN sequencing is emerging as a superior method for species resolution, exceeding the capabilities of short-read and full-length 16S rRNA sequencing. However, being in its early stages of development, RRN sequencing has several underexplored or understudied elements, highlighting the need for a critical and thorough examination of its methodologies. Key areas that require detailed analysis include understanding how primer pairs, sequencing platforms, and classifiers and databases affect the accuracy of species resolution achieved through RRN sequencing. Our study addresses these gaps by evaluating the effect of primer pairs using four RRN primer combinations, and that of sequencing platforms by employing PacBio and Oxford Nanopore Technologies (ONT) systems. Furthermore, two classification methods (Minimap2 and OTU clustering), in combination with four RRN reference databases (MIrROR, rrnDB, and two versions of GROND) were compared to identify consistent and accurate classification methods with RRN sequencing. Here we demonstrate that RRN primer pair choice and sequencing platform do not substantially bias taxonomic profiles for most of the tested mock communities, while classification methods significantly impact the accuracy of species-level assignments. Of the classification methods tested, Minimap2 classifier in combination with the GROND database most consistently provided accurate species-level classification across the communities tested, irrespective of sequencing platform.
Clinical Genetics  |  2025

Haplotype Phasing of Biallelic WNT10B Variants Using Long-Read Sequencing in Split-Hand/Foot Malformation Syndrome

Jelena Pozojevic, Naseebullah Kakar, Henrike L. Sczakiel, Nathalie Kruse, Kristian Händler, Saranya Balachandran, Varun Sreenivasan, Martin A. Mensah, Malte Spielmann

Split-hand/foot malformation syndrome (SHFM) is a congenital limb malformation that is both clinically and genetically heterogeneous. Variants in WNT10B are known to cause an autosomal recessive form of SHFM. Here, we report a patient born to unrelated parents who was found to be a compound heterozygote for missense variants in WNT10B: c.994C>T, p.(Arg332Trp) and c.638T>G, p.(Phe213Cys). The variants were identified using long-read PacBio sequencing, which enabled phasing and confirmed that they were located on different alleles. The maternally inherited variant p.(Arg332Trp) has been previously reported, whereas the paternally inherited variant p.(Phe213Cys) is novel and absent from the gnomAD database. Our findings highlight the utility of long-read haplotype phasing, which provides valuable insights in determining the biallelic nature of variants in recessive disorders when parental DNA samples are unavailable.
bioRxiv  |  2025

Long-Read Low-Pass Sequencing for High-Resolution Trait Mapping

Kendall Lee, Walid Korani, Philip C. Bentz, Sameer Pokhrel, Peggy Ozias-Akins, Alex Harkess, Justin Vaughn, Josh Clevenger

Accelerating crop improvement is critical to meeting food security demands in a changing climate. Long-read sequencing offers advantages over short-reads in resolving structural variations (SVs) and aligning to complex genomes, but its high cost has limited adoption in breeding programs. Here we develop a high-throughput, scalable approach for long-read low-pass (LRLP) sequencing and variant analysis with PacBio HiFi reads, and apply it to trait mapping in a complex tetraploid peanut (Arachis hypogaea) genome multi-parent advanced generation intercross. We analyze LRLP using both a single reference genome and a pangraph, using both proprietary and open-source tools to analyze SVs and coverage. An increased number of variants are consistently called for LRLP data compared to short-read data. At 1.63x average depth, LRLP sequencing covered 55% of the genome and 58% of gene space, outperforming 1.68x depth short-read low-pass sequencing, which achieved only 17% and 11%, respectively. Enhanced data retention after filtering for probabilistic misalignment and an ∼8.5x decrease in cost per value further demonstrated LRLP’s efficacy. Our results highlight LRLP sequencing as a scalable, cost-effective tool for high-resolution trait mapping, with transformative potential for plant breeding and broader genomic applications.
bioRxiv  |  2025

Epigenetic phase variation in the gut microbiome enhances bacterial adaptation

Mi Ni, Yu Fan, Yujie Liu, Yangmei Li, Wanjin Qiao, Lauren E. Davey, Xue-Song Zhang, Magdalena Ksiezarek, Edward Mead, Alan Touracheau, Wenyan Jiang, Martin J. Blaser, Raphael H. Valdivia, Gang Fang

The human gut microbiome within the gastrointestinal tract continuously adapts to variations in diet, medications, and host physiology. A central strategy for genetic adaptation is epigenetic phase variation (ePV) mediated by bacterial DNA methylation, which can regulate gene expression, enhance clonal heterogeneity, and enable a single bacterial strain to exhibit variable phenotypic states. Genome-wide and site-specific ePV have been well characterized in human pathogens’ antigenic variation and virulence factor production. However, the role of ePV in facilitating adaptation within the human microbiome remains poorly understood. Here, we comprehensively cataloged genome-wide and site-specific ePV in human infant and adult gut microbiomes. First, using long-read metagenomic sequencing, we detected genome-wide ePV mediated by complex structural variations of DNA methyltransferases, highlighting the ones associated with antibiotics or fecal microbiota transplantation. Second, we analyzed an extensive collection of public short-read metagenomic sequencing datasets, uncovering a greater prevalence of genome-wide ePV in the human gut microbiome. Third, we quantitatively detected site-specific ePVs using single-molecule methylation analysis to identify dynamic variations associated with antibiotic treatment or probiotic engraftment. Finally, we performed an in-depth assessment of an Akkermansia muciniphila isolate from an infant, highlighting that ePV can regulate gene expression and enhance the bacterial adaptive capacity by employing a bet-hedging strategy to increase tolerance to differing antibiotics. Our findings indicate that epigenetic modifications are a common and broad strategy used by bacteria in the human gut to adapt to their environment.
bioRxiv  |  2025

Detailed tandem repeat allele profiling in 1,027 long-read genomes reveals genome-wide patterns of pathogenicity

Matt C. Danzi, Isaac R. L. Xu, Sarah Fazal, Egor Dolzhenko, David Pellerin, Ben Weisburd, Chloe Reuter, Jacinda Sampson, Chiara Folland, Matthew Wheeler, Anne O’Donnell-Luria, Stefan Wuchty, Gianina Ravenscroft, Michael A. Eberle, All of Us Research Program Long Read Working Group, Stephan Zuchner

Tandem repeats are a highly polymorphic class of genomic variation that play causal roles in rare diseases but are notoriously difficult to sequence using short-read techniques1,2. Most previous studies profiling tandem repeats genome-wide have reduced the description of each locus to the singular value of the length of the entire repetitive locus3,4. Here we introduce a comprehensive database of 3.6 billion tandem repeat allele sequences from over one thousand individuals using HiFi long-read sequencing. We show that the previously identified pathogenic loci are among the most variable tandem repeat loci in the genome, when incorporating nucleotide resolution sequence content to measure the longest pure motif segment. More broadly, we introduce a novel measure, ‘tandem repeat constraint’, that assists in distinguishing potentially pathogenic from benign loci. Our approach of measuring variation as ‘the length of the longest pure segment’ successfully prioritizes pathogenic repeats within their previously published linkage regions. We also present evidence for two novel pathogenic repeat expansion candidates. In summary, this analysis significantly clarifies the potential for short tandem repeat pathogenicity at over 1.7 million tandem repeat loci and will aid the identification of disease-causing repeat expansions.
Genome Research  |  2024

Evaluation of strategies for evidence-driven genome annotation using long-read RNA-seq

Alejandro Paniagua1, Cristina Agustin-García, Francisco J Pardo-Palacios, Thomas Brown, Maite De Maria, Nancy D Denslow, Camila Mazzoni and Ana Conesa

While the production of a draft genome has become more accessible due to long-read sequencing, the annotation of these new genomes has not been developed at the same pace. Long-read RNA sequencing (lrRNA-seq) offers a promising solution for enhancing gene annotation. In this study, we explore how sequencing platforms, Oxford Nanopore R9.4.1 chemistry or PacBio Sequel II CCS, and data processing methods influence evidence-driven genome annotation using long reads. Incorporating PacBio transcripts into our annotation pipeline significantly outperformed traditional methods, such as ab initio predictions and short-read-based annotations. We applied this strategy to a nonmodel species, the Florida manatee, and compared our results to existing short-read-based annotation. At the loci level, both annotations were highly concordant, with 90% agreement. However, at the transcript level, the agreement was only 35%. We identified 4,906 novel loci, represented by 5,707 isoforms, with 64% of these isoforms matching known sequences in other mammalian species. Overall, our findings underscore the importance of using high-quality curated transcript models in combination with ab initio methods for effective genome annotation.
bioRxiv  |  2024

An Emirati pangenome incorporating a diploid telomere-to-telomere reference

Michael Olbrich, Mira Mousa, Inken Wohlers, Amira Al Aamri, Halima Alnaqbi, Aisha Hanaya Alsuwaidi, Hima Vadakkeveettil Manoharan, Nour al-dain Marzouka, Sanjay Erathodi Ramachandran, Anju Annie Thomas, Mohammed Alameri, Guan K Tay, Rifat Hamoudi, Saleh Ibrahim, Noura Al Ghaithi, Habiba Alsafar

Reference data on genomic variation forms the basis of genetics research. Limitations in identifying genetic variation from single reference sequences have recently been addressed through improvements in sequencing technologies, allowing the generation of pangenomic references from multiple accurate chromosome-level de novo assemblies. Nevertheless, global pangenomes to date have yet to include genomes from the populations of the Middle Eastern Region. To address this shortcoming, this study provides an Emirati genome reference. Its core is a diploid assembly with a Quality Value (QV) of 60 that includes ten telomere-to-telomere chromosomes. This assembly is incorporated into a pangenome graph constructed of 52 additional high-quality assemblies, half of which are trio-based. This Emirati pangenome reveals a similar level of genomic variation as the one compiled by the Human Pangenome Reference Consortium, underscoring its utility for the identification of both global and population-centered genomic variation, even in genome regions that have been traditionally challenging to assemble but are covered by the Emirati telomere-to-telomere assembly. As such, the Emirati genome reference significantly contributes to genomic research globally and is an essential resource for genomics-based personalized medicine in the United Arab Emirates and other parts of the Middle East.
npj genomic medicine  |  2024

Long-read genome sequencing resolves complex genomic rearrangements in rare genetic syndromes

Iftekhar A. Showpnil, Maria E. Hernandez Gonzalez, Swetha Ramadesikan, Mohammad Marhabaie, Allison Daley, Leeran Dublin-Ryan, Matthew T. Pastore, Umamaheswaran Gurusamy, Jesse M. Hunter, Brandon S. Stone, Dennis W. Bartholomew, Kandamurugu Manickam, Anthony R. Miller, Richard K. Wilson, Rolf W. Stottmann & Daniel C. Koboldt

Long-read sequencing can often overcome the deficiencies in routine microarray or short-read technologies in detecting complex genomic rearrangements. Here we used Pacific Biosciences circular consensus sequencing to resolve complex rearrangements in two patients with rare genetic anomalies. Copy number variants (CNVs) identified by clinical microarray —chr8p deletion and chr8q duplication in patient 1, and interstitial deletions of chr18q in patient 2—were suggestive of underlying rearrangements. Long-read genome sequencing not only confirmed these CNVs but also revealed their genomic structures. In patient 1, we resolved a novel recombinant chromosome 8 (Rec8)-like rearrangement with a 3.43 Mb chr8q terminal duplication that was linked to a 7.25–8.21 Mb chr8p terminal deletion. In patient 2, we uncovered a novel complex rearrangement involving a 1.17 Mb rearranged segment and four interstitial deletions ranging from 9 bp to 12.39 Mb. Our results underscore the diversity of clinically relevant structural rearrangements and the power of long-read sequencing in unraveling their nuanced architectures.
Cell Symposia  |  2024

Single chromatin fiber profiling and nucleosome position mapping in the human brain

Cyril J. Peter, Aman Agarwa, Risa Watanabe1, Andrew B. Stergachis, Dan Hasson, Schahram Akbarian (see article for full authors and affiliations)

Nucleosomes are the basic unit of chromatin, but genome-scale maps of nucleosome positioning along chromatin fibers do not exist for brain and many other complex tissues. Conventional chromatin accessibility assays designed to map nucleosome-depleted regions via nucleolytic digestion face major limitations such as limited resolution and sequence bias, with additional shortcomings from PCR-generated short-read libraries including poor annotation for an estimated 50% of the human genome. To address this, we tested a brain-adapted single-molecule chromatin fiber sequencing (Fiber-seq) protocol designed for amplification-free adenine-methyltransferase tagging of extranucleosomal DNA in neuronal and, separately, non-neuronal nuclei in situ.
Genome Biology and Evolution  |  2024

Multiple Displacement Amplification Facilitates SMRT Sequencing of Microscopic Animals and the Genome of the Gastrotrich Lepidodermella squamata

Nickellaus G Roberts, Michael J Gilmore, Torsten H Struck, Kevin M Kocot

Obtaining adequate DNA for long-read genome sequencing remains a roadblock to producing contiguous genomes from small-bodied organisms, hindering understanding of phylogenetic relationships and genome evolution. Multiple displacement amplification leverages Phi29 DNA polymerase to produce micrograms of DNA from picograms of input. However, multiple displacement amplification's inherent biases in amplification related to guanine and cytosine (GC) content, repeat content and chimera production are a problem for long-read genome assembly, which has been little investigated. We explored the utility of multiple displacement amplification for generating template DNA for High Fidelity (HiFi) sequencing directly from living cells of Caenorhabditis elegans (Nematoda) and Lepidodermella squamata (Gastrotricha) containing one order of magnitude less DNA than required for the PacBio Ultra-Low DNA Input Workflow. High Fidelity sequencing of libraries prepared from multiple displacement amplification products resulted in highly contiguous and complete genomes for both C. elegans (102 Mbp assembly; 336 contigs; N50 = 868 kbp; L50 = 39; BUSCO_nematoda_nucleotide: S:96.1%, D:2.8%) and L. squamata (122 Mbp assembly; 157 contigs; N50 = 3.9 Mbp; L50 = 13; BUSCO_metazoa_nucleotide: S:80.8%, D:2.8%). Coverage uniformity for reads from multiple displacement amplification DNA (Gini Index: 0.14, normalized mean across all 100 kbp blocks: 0.49) and reads from pooled nematode DNA (Gini Index: 0.16, normalized mean across all 100 kbp blocks: 0.49) proved similar. Using this approach, we sequenced the genome of the microscopic invertebrate L. squamata (Gastrotricha), the first of its phylum. Using the newly sequenced genome, we infer Gastrotricha's long-debated phylogenetic position as the sister taxon of Platyhelminthes and conduct a comparative analysis of the Hox cluster.
medTxiv  |  2024

Comprehensive genetic analysis of STRC variants in hereditary hearing impairment using long-read sequencing

Cheng-Yu Tsai, Yue-Sheng Lu, Yu-Ting Chiang, Ming-Yu Lo, Pei-Hsuan Lin, Shih-Feng Tsai, Chuan-Jen Hsu, Pei-Lung Chen, Jacob Shu-Jui Hsu, Chen-Chi Wu

This study represents the first large-scale clinical investigation utilizing LRS technology for the genetic diagnosis of SNHI. Our study highlights the diagnostic capabilities of LRS in detecting complex variants within the STRC and advancing our understanding of the genetic etiology of SNHI that remains unresolved by conventional NGS.
Quick search

Quick search is faster but may return fewer results.

Advanced search

Advanced search allows you to search more fields but may take longer.

Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.