Menu
Sprite decoration

Scientific publications

Publications featuring PacBio long-read + short-read sequencing data

Biorxiv  |  2024

Microflora Danica: the atlas of Danish environmental microbiomes

Singleton, Jensen, Delogu,Sørensen, Jørgensen, Karst, Yang, Knudsen, Sereika, Petriglieri, Knutsson, Dall, Kirkegaard, Kristensen, Woodcroft, Speth, Aroney, The Microflora Danica Consortium, Wagner, Dueholm, Nielsen, Albertsen

In this preprint, scientists from Denmark, Australia, and Austria, conducted a study where HiFi sequencing was used for rRNA operon sequencing for 449 (multiplexed in pools of 92 samples) microbiome samples (14.9 million bacterial (median 4,528 bp, containing both 16S and 23S) and 13.4 million eukaryotic rRNA operon sequences (median 4,035 bp, containing both 18S and 28S)). “This dataset is an order of magnitude larger than the current most comprehensive database SILVA 138.1”, and provides “an unprecedented resource and the foundation for answering fundamental questions underlying microbial ecology: what drives microbial diversity, distribution and function.”
Biorxiv  |  2023

Multi-omic profiling of pathogen-stimulated primary immune cells

Renee Salz, Emil E. Vorsteveld, Caspar I. van der Made, Simone Kersten, Merel Stemerdink, Tsung-han Hsieh, Musa Mhlanga, Mihai G. Netea, Pieter-Jan Volders, Alexander Hoischen, Peter A.C. ’t Hoen

Objectives To perform long-read transcriptome and proteome profiling of pathogen-stimulated peripheral blood mononuclear cells (PBMCs) from healthy donors. We aim to discover new transcripts and protein isoforms expressed during immune responses to diverse pathogens. Methods PBMCs were exposed to four microbial stimuli for 24 hours: the TLR4 ligand lipopolysaccharide (LPS), the TLR3 ligand Poly(I:C), heat-inactivated Staphylococcus aureus, Candida albicans, and RPMI medium as negative controls. Long-read sequencing (PacBio) of one donor and secretome proteomics and short-read sequencing of five donors were performed. IsoQuant was used for transcriptome construction, Metamorpheus/FlashLFQ for proteome analysis, and Illumina short-read 3’-end mRNA sequencing for transcript quantification. Results Long-read transcriptome profiling reveals the expression of novel sequences and isoform switching induced upon pathogen stimulation, including transcripts that are difficult to detect using traditional short-read sequencing. We observe widespread loss of intron retention as a common result of all pathogen stimulations. We highlight novel transcripts of NFKB1 and CASP1 that may indicate novel immunological mechanisms. In general, RNA expression differences did not result in differences in the amounts of secreted proteins. Interindividual differences in the proteome were larger than the differences between stimulated and unstimulated PBMCs. Clustering analysis of secreted proteins revealed a correlation between chemokine (receptor) expression on the RNA and protein levels in C. albicans- and Poly(I:C)-stimulated PBMCs. Conclusion Isoform aware long-read sequencing of pathogen-stimulated immune cells highlights the potential of these methods to identify novel transcripts, revealing a more complex transcriptome landscape than previously appreciated.
Nature Communications  |  2022

HiFi metagenomic sequencing enables assembly of accurate and complete genomes from human gut microbiota

Kim, Chan Yeong and Ma, Junyeong and Lee, Insuk

Metagenomics, Microbiology, Microbiome, Microbial community, HiFi, complete MAGs, circular contigs

Advances in metagenomic assembly have led to the discovery of genomes belonging to uncultured microorganisms. Metagenome-assembled genomes (MAGs) often suffer from fragmentation and chimerism. Recently, 20 complete MAGs (cMAGs) have been assembled from Oxford Nanopore long-read sequencing of 13 human fecal samples, but with low nucleotide accuracy. Here, we report 102 cMAGs obtained by Pacific Biosciences (PacBio) high-accuracy long-read (HiFi) metagenomic sequencing of five human fecal samples, whose initial circular contigs were selected for complete prokaryotic genomes using our bioinformatics workflow. Nucleotide accuracy of the final cMAGs was as high as that of Illumina sequencing. The cMAGs could exceed 6 Mbp and included complete genomes of diverse taxa, including entirely uncultured RF39 and TANB77 orders. Moreover, cMAGs revealed that regions hard to assemble by short-read sequencing comprised mostly genomic islands and rRNAs. HiFi metagenomic sequencing will facilitate cataloging accurate and complete genomes from complex microbial communities, including uncultured species.
Nature Methods  |  2022

Metagenome assembly of high-fidelity long reads with hifiasm-meta

Feng, Xiaowen and Cheng, Haoyu and Portik, Daniel and Li, Heng

Metagenomics, Microbiology, Microbiome, Microbial community, Metagenome, MAGs, assembly

De novo assembly of metagenome samples is a common approach to the study of microbial communities. Current metagenome assemblers developed for short sequence reads or noisy long reads were not optimized for accurate long reads. We thus developed hifiasm-meta, a metagenome assembler that exploits the high accuracy of recent data. Evaluated on seven empirical datasets, hifiasm-meta reconstructed tens to hundreds of complete circular bacterial genomes per dataset, consistently outperforming other metagenome assemblers.
Microbial Genomics  |  2022

Finding the right fit: Evaluation of short-read and long-read sequencing approaches to maximize the utility of clinical microbiome data

Gehrig, Jeanette L and Portik, Daniel M and Driscoll, Mark D and Jackson, Eric and Chakraborty, Shreyasee and Gratalo, Dawn and Ashby, Meredith and Valladares, Ricardo

Microbiome, Microbial community, 16S rRNA, live biotherapeutic product, shotgun metagenomics, MAGs

A long-standing challenge in human microbiome research is achieving the taxonomic and functional resolution needed to generate testable hypotheses about the gut microbiota's impact on health and disease. With a growing number of live microbial interventions in clinical development, this challenge is renewed by a need to understand the pharmacokinetics and pharmacodynamics of therapeutic candidates. While short-read sequencing of the bacterial 16S rRNA gene has been the standard for microbiota profiling, recent improvements in the fidelity of long-read sequencing underscores the need for a re-evaluation of the value of distinct microbiome-sequencing approaches. We leveraged samples from participants enrolled in a phase 1b clinical trial of a novel live biotherapeutic product to perform a comparative analysis of short-read and long-read amplicon and metagenomic sequencing approaches to assess their utility for generating clinical microbiome data. Across all methods, overall community taxonomic profiles were comparable and relationships between samples were conserved. Comparison of ubiquitous short-read 16S rRNA amplicon profiling to long-read profiling of the 16S-ITS-23S rRNA amplicon showed that only the latter provided strain-level community resolution and insight into novel taxa. All methods identified an active ingredient strain in treated study participants, though detection confidence was higher for long-read methods. Read coverage from both metagenomic methods provided evidence of active-ingredient strain replication in some treated participants. Compared to short-read metagenomics, approximately twice the proportion of long reads were assigned functional annotations. Finally, compositionally similar bacterial metagenome-assembled genomes (MAGs) were recovered from short-read and long-read metagenomic methods, although a greater number and more complete MAGs were recovered from long reads. Despite higher costs, both amplicon and metagenomic long-read approaches yielded added microbiome data value in the form of higher confidence taxonomic and functional resolution and improved recovery of microbial genomes compared to traditional short-read methodologies.
Nature Biotechnology  |  2022

Generating lineage-resolved, complete metagenome-assembled genomes from complex microbial communities

Bickhart, Derek M. and Kolmogorov, Mikhail and Tseng, Elizabeth and Portik, Daniel M. And Korobeynikov, Anton and Tolstoganov, Ivan and Uritskiy, Gherman and Liachko, Ivan and Sullivan, Shawn T. and Shin, Sung Bong and Zorea, Alvah and Andreu, Victoria Pascal and Panke-Buisse, Kevin and Medema, Marnix H. and Mizrahi, Itzhak and Pevzner, Pavel A. and Smith, Timothy P.

Metagenomics, Microbiology, Microbiome

Microbial communities might include distinct lineages of closely related organisms that complicate metagenomic assembly and prevent the generation of complete metagenome-assembled genomes (MAGs). Here we show that deep sequencing using long (HiFi) reads combined with Hi-C binning can address this challenge even for complex microbial communities. Using existing methods, we sequenced the sheep fecal metagenome and identified 428 MAGs with more than 90% completeness, including 44 MAGs in single circular contigs. To resolve closely related strains (lineages), we developed MAGPhase, which separates lineages of related organisms by discriminating variant haplotypes across hundreds of kilobases of genomic sequence. MAGPhase identified 220 lineage-resolved MAGs in our dataset. The ability to resolve closely related microbes in complex microbial communities improves the identification of biosynthetic gene clusters and the precision of assigning mobile genetic elements to host genomes. We identified 1,400 complete and 350 partial biosynthetic gene clusters, most of which are novel, as well as 424 (298) potential host–viral (host–plasmid) associations using Hi-C data.
Annals of human genetics  |  2019

Improved assembly and variant detection of a haploid human genome using single-molecule, high-fidelity long reads.

Vollger, Mitchell R and Logsdon, Glennis A and Audano, Peter A and Sulovari, Arvis and Porubsky, David and Peluso, Paul and Wenger, Aaron M and Concepcion, Gregory T and Kronenberg, Zev N and Munson, Katherine M and Baker, Carl and Sanders, Ashley D and Spierings, Diana C J and Lansdorp, Peter M and Surti, Urvashi and Hunkapiller, Michael W and Eichler, Evan E

The sequence and assembly of human genomes using long-read sequencing technologies has revolutionized our understanding of structural variation and genome organization. We compared the accuracy, continuity, and gene annotation of genome assemblies generated from either high-fidelity (HiFi) or continuous long-read (CLR) datasets from the same complete hydatidiform mole human genome. We find that the HiFi sequence data assemble an additional 10% of duplicated regions and more accurately represent the structure of tandem repeats, as validated with orthogonal analyses. As a result, an additional 5 Mbp of pericentromeric sequences are recovered in the HiFi assembly, resulting in a 2.5-fold increase in the NG50 within 1 Mbp of the centromere (HiFi 480.6 kbp, CLR 191.5 kbp). Additionally, the HiFi genome assembly was generated in significantly less time with fewer computational resources than the CLR assembly. Although the HiFi assembly has significantly improved continuity and accuracy in many complex regions of the genome, it still falls short of the assembly of centromeric DNA and the largest regions of segmental duplication using existing assemblers. Despite these shortcomings, our results suggest that HiFi may be the most effective standalone technology for de novo assembly of human genomes. © 2019 John Wiley & Sons Ltd/University College London.
Applied and environmental microbiology  |  2019

Relative Performance of MinION (Oxford Nanopore Technologies) versus Sequel (Pacific Biosciences) Third-Generation Sequencing Instruments in Identification of Agricultural and Forest Fungal Pathogens.

Loit, Kaire and Adamson, Kalev and Bahram, Mohammad and Puusepp, Rasmus and Anslan, Sten and Kiiker, Riinu and Drenkhan, Rein and Tedersoo, Leho

Culture-based molecular identification methods have revolutionized detection of pathogens, yet these methods are slow and may yield inconclusive results from environmental materials. The second-generation sequencing tools have much-improved precision and sensitivity of detection, but these analyses are costly and may take several days to months. Of the third-generation sequencing techniques, the portable MinION device (Oxford Nanopore Technologies) has received much attention because of its small size and possibility of rapid analysis at reasonable cost. Here, we compare the relative performances of two third-generation sequencing instruments, MinION and Sequel (Pacific Biosciences), in identification and diagnostics of fungal and oomycete pathogens from conifer (Pinaceae) needles and potato (Solanum tuberosum) leaves and tubers. We demonstrate that the Sequel instrument is efficient for metabarcoding of complex samples, whereas MinION is not suited for this purpose due to a high error rate and multiple biases. However, we find that MinION can be utilized for rapid and accurate identification of dominant pathogenic organisms and other associated organisms from plant tissues following both amplicon-based and PCR-free metagenomics approaches. Using the metagenomics approach with shortened DNA extraction and incubation times, we performed the entire MinION workflow, from sample preparation through DNA extraction, sequencing, bioinformatics, and interpretation, in 2.5 h. We advocate the use of MinION for rapid diagnostics of pathogens and potentially other organisms, but care needs to be taken to control or account for multiple potential technical biases.IMPORTANCE Microbial pathogens cause enormous losses to agriculture and forestry, but current combined culturing- and molecular identification-based detection methods are too slow for rapid identification and application of countermeasures. Here, we develop new and rapid protocols for Oxford Nanopore MinION-based third-generation diagnostics of plant pathogens that greatly improve the speed of diagnostics. However, due to high error rate and technical biases in MinION, the Pacific BioSciences Sequel platform is more useful for in-depth amplicon-based biodiversity monitoring (metabarcoding) from complex environmental samples.Copyright © 2019 American Society for Microbiology.
GigaScience  |  2019

A high-quality genome assembly from a single, field-collected spotted lanternfly (Lycorma delicatula) using the PacBio Sequel II system

Kingan, Sarah B and Urban, Julie and Lambert, Christine C and Baybayan, Primo and Childers, Anna K and Coates, Brad and Scheffler, Brian and Hackett, Kevin and Korlach, Jonas and Geib, Scott M

Background A high-quality reference genome is an essential tool for applied and basic research on arthropods. Long-read sequencing technologies may be used to generate more complete and contiguous genome assemblies than alternate technologies; however, long-read methods have historically had greater input DNA requirements and higher costs than next-generation sequencing, which are barriers to their use on many samples. Here, we present a 2.3 Gb de novo genome assembly of a field-collected adult female spotted lanternfly (Lycorma delicatula) using a single Pacific Biosciences SMRT Cell. The spotted lanternfly is an invasive species recently discovered in the northeastern United States that threatens to damage economically important crop plants in the region. Results The DNA from 1 individual was used to make 1 standard, size-selected library with an average DNA fragment size of ~20 kb. The library was run on 1 Sequel II SMRT Cell 8M, generating a total of 132 Gb of long-read sequences, of which 82 Gb were from unique library molecules, representing ~36× coverage of the genome. The assembly had high contiguity (contig N50 length = 1.5 Mb), completeness, and sequence level accuracy as estimated by conserved gene set analysis (96.8% of conserved genes both complete and without frame shift errors). Furthermore, it was possible to segregate more than half of the diploid genome into the 2 separate haplotypes. The assembly also recovered 2 microbial symbiont genomes known to be associated with L. delicatula, each microbial genome being assembled into a single contig. Conclusions We demonstrate that field-collected arthropods can be used for the rapid generation of high-quality genome assemblies, an attractive approach for projects on emerging invasive species, disease vectors, or conservation efforts of endangered species.
Nucleic acids research  |  2019

High-throughput amplicon sequencing of the full-length 16S rRNA gene with single-nucleotide resolution.

Callahan, Benjamin J and Wong, Joan and Heiner, Cheryl and Oh, Steve and Theriot, Casey M and Gulati, Ajay S and McGill, Sarah K and Dougherty, Michael K

Targeted PCR amplification and high-throughput sequencing (amplicon sequencing) of 16S rRNA gene fragments is widely used to profile microbial communities. New long-read sequencing technologies can sequence the entire 16S rRNA gene, but higher error rates have limited their attractiveness when accuracy is important. Here we present a high-throughput amplicon sequencing methodology based on PacBio circular consensus sequencing and the DADA2 sample inference method that measures the full-length 16S rRNA gene with single-nucleotide resolution and a near-zero error rate. In two artificial communities of known composition, our method recovered the full complement of full-length 16S sequence variants from expected community members without residual errors. The measured abundances of intra-genomic sequence variants were in the integral ratios expected from the genuine allelic variants within a genome. The full-length 16S gene sequences recovered by our approach allowed Escherichia coli strains to be correctly classified to the O157:H7 and K12 sub-species clades. In human fecal samples, our method showed strong technical replication and was able to recover the full complement of 16S rRNA alleles in several E. coli strains. There are likely many applications beyond microbial profiling for which high-throughput amplicon sequencing of complete genes with single-nucleotide resolution will be of use. © The Author(s) 2019. Published by Oxford University Press on behalf of Nucleic Acids Research.
Nucleic acids research  |  2019

Tandem repeats lead to sequence assembly errors and impose multi-level challenges for genome and protein databases.

Tørresen, Ole K and Star, Bastiaan and Mier, Pablo and Andrade-Navarro, Miguel A and Bateman, Alex and Jarnot, Patryk and Gruca, Aleksandra and Grynberg, Marcin and Kajava, Andrey V and Promponas, Vasilis J and Anisimova, Maria and Jakobsen, Kjetill S and Linke, Dirk

The widespread occurrence of repetitive stretches of DNA in genomes of organisms across the tree of life imposes fundamental challenges for sequencing, genome assembly, and automated annotation of genes and proteins. This multi-level problem can lead to errors in genome and protein databases that are often not recognized or acknowledged. As a consequence, end users working with sequences with repetitive regions are faced with 'ready-to-use' deposited data whose trustworthiness is difficult to determine, let alone to quantify. Here, we provide a review of the problems associated with tandem repeat sequences that originate from different stages during the sequencing-assembly-annotation-deposition workflow, and that may proliferate in public database repositories affecting all downstream analyses. As a case study, we provide examples of the Atlantic cod genome, whose sequencing and assembly were hindered by a particularly high prevalence of tandem repeats. We complement this case study with examples from other species, where mis-annotations and sequencing errors have propagated into protein databases. With this review, we aim to raise the awareness level within the community of database users, and alert scientists working in the underlying workflow of database creation that the data they omit or improperly assemble may well contain important biological information valuable to others. © The Author(s) 2019. Published by Oxford University Press on behalf of Nucleic Acids Research.
Nature communications  |  2019

Long-read assembly of the Chinese rhesus macaque genome and identification of ape-specific structural variants.

He, Yaoxi and Luo, Xin and Zhou, Bin and Hu, Ting and Meng, Xiaoyu and Audano, Peter A and Kronenberg, Zev N and Eichler, Evan E and Jin, Jie and Guo, Yongbo and Yang, Yanan and Qi, Xuebin and Su, Bing

We present a high-quality de novo genome assembly (rheMacS) of the Chinese rhesus macaque (Macaca mulatta) using long-read sequencing and multiplatform scaffolding approaches. Compared to the current Indian rhesus macaque reference genome (rheMac8), rheMacS increases sequence contiguity 75-fold, closing 21,940 of the remaining assembly gaps (60.8 Mbp). We improve gene annotation by generating more than two million full-length transcripts from ten different tissues by long-read RNA sequencing. We sequence resolve 53,916 structural variants (96% novel) and identify 17,000 ape-specific structural variants (ASSVs) based on comparison to ape genomes. Many ASSVs map within ChIP-seq predicted enhancer regions where apes and macaque show diverged enhancer activity and gene expression. We further characterize a subset that may contribute to ape- or great-ape-specific phenotypic traits, including taillessness, brain volume expansion, improved manual dexterity, and large body size. The rheMacS genome assembly serves as an ideal reference for future biomedical and evolutionary studies.
Nature biotechnology  |  2019

Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome.

Wenger, Aaron M and Peluso, Paul and Rowell, William J and Chang, Pi-Chuan and Hall, Richard J and Concepcion, Gregory T and Ebler, Jana and Fungtammasan, Arkarachai and Kolesnikov, Alexey and Olson, Nathan D and Töpfer, Armin and Alonge, Michael and Mahmoud, Medhat and Qian, Yufeng and Chin, Chen-Shan and Phillippy, Adam M and Schatz, Michael C and Myers, Gene and DePristo, Mark A and Ruan, Jue and Marschall, Tobias and Sedlazeck, Fritz J and Zook, Justin M and Li, Heng and Koren, Sergey and Carroll, Andrew and Rank, David R and Hunkapiller, Michael W

The DNA sequencing technologies in use today produce either highly accurate short reads or less-accurate long reads. We report the optimization of circular consensus sequencing (CCS) to improve the accuracy of single-molecule real-time (SMRT) sequencing (PacBio) and generate highly accurate (99.8%) long high-fidelity (HiFi) reads with an average length of 13.5?kilobases (kb). We applied our approach to sequence the well-characterized human HG002/NA24385 genome and obtained precision and recall rates of at least 99.91% for single-nucleotide variants (SNVs), 95.98% for insertions and deletions <50 bp (indels) and 95.99% for structural variants. Our CCS method matches or exceeds the ability of short-read sequencing to detect small variants and structural variants. We estimate that 2,434 discordances are correctable mistakes in the 'genome in a bottle' (GIAB) benchmark set. Nearly all (99.64%) variants can be phased into haplotypes, further improving variant detection. De novo genome assembly using CCS reads alone produced a contiguous and accurate genome with a contig N50 of >15?megabases (Mb) and concordance of 99.997%, substantially outperforming assembly with less-accurate long reads.
Genome biology  |  2019

Assignment of virus and antimicrobial resistance genes to microbial hosts in a complex microbial community by combined long-read assembly and proximity ligation.

Bickhart, Derek M and Watson, Mick and Koren, Sergey and Panke-Buisse, Kevin and Cersosimo, Laura M and Press, Maximilian O and Van Tassell, Curtis P and Van Kessel, Jo Ann S and Haley, Bradd J and Kim, Seon Woo and Heiner, Cheryl and Suen, Garret and Bakshy, Kiranmayee and Liachko, Ivan and Sullivan, Shawn T and Myer, Phillip R and Ghurye, Jay and Pop, Mihai and Weimer, Paul J and Phillippy, Adam M and Smith, Timothy P L

We describe a method that adds long-read sequencing to a mix of technologies used to assemble a highly complex cattle rumen microbial community, and provide a comparison to short read-based methods. Long-read alignments and Hi-C linkage between contigs support the identification of 188 novel virus-host associations and the determination of phage life cycle states in the rumen microbial community. The long-read assembly also identifies 94 antimicrobial resistance genes, compared to only seven alleles in the short-read assembly. We demonstrate novel techniques that work synergistically to improve characterization of biological features in a highly complex rumen microbial community.
Environmental microbiology  |  2019

Confident phylogenetic identification of uncultured prokaryotes through long read amplicon sequencing of the 16S-ITS-23S rRNA operon.

Martijn, Joran and Lind, Anders E and Schön, Max E and Spiertz, Ian and Juzokaite, Lina and Bunikis, Ignas and Pettersson, Olga V and Ettema, Thijs J G

Amplicon sequencing of the 16S rRNA gene is the predominant method to quantify microbial compositions and to discover novel lineages. However, traditional short amplicons often do not contain enough information to confidently resolve their phylogeny. Here we present a cost-effective protocol that amplifies a large part of the rRNA operon and sequences the amplicons with PacBio technology. We tested our method on a mock community and developed a read-curation pipeline that reduces the overall read error rate to 0.18%. Applying our method on four environmental samples, we captured near full-length rRNA operon amplicons from a large diversity of prokaryotes. The method operated at moderately high-throughput (22286-37,850 raw ccs reads) and generated a large amount of putative novel archaeal 23S rRNA gene sequences compared to the archaeal SILVA database. These long amplicons allowed for higher resolution during taxonomic classification by means of long (~1000 bp) 16S rRNA gene fragments and for substantially more confident phylogenies by means of combined near full-length 16S and 23S rRNA gene sequences, compared to shorter traditional amplicons (250 bp of the 16S rRNA gene). We recommend our method to those who wish to cost-effectively and confidently estimate the phylogenetic diversity of prokaryotes in environmental samples at high throughput. © 2019 The Authors. Environmental Microbiology published by Society for Applied Microbiology and John Wiley & Sons Ltd.
Advanced search
Author search
Year search

Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.