More than 150 SMRT Sequencing users gathered at Stanford University for our annual West Coast User Meeting & workshops earlier this month. Many thanks to all the scientists who attended and shared their research. For anyone who couldn’t make it, we’ve included some highlights from each talk below (and links to download the full presentations when possible):
The event began with Marty Badgett, our senior product manager for the Sequel System and the PacBio RS II, discussing recent technology updates. He presented the most recent results from the Sequel System highlighting resequencing and small and large genome applications. Specifically, two metagenomic- and one immunology-targeted sequencing datasets demonstrated high single-molecule accuracy with over 225,000 reads each at >QV30. Next up were large insert libraries, showing a range of data from bacterial, plant, and animal projects. These featured the benefits of a near 7-fold increase in number of reads coming from the larger Sequel SMRT Cell with half of the data coming from reads >14,500 bp each. Finally, Marty showed the history of our development efforts on the PacBio RS platform and how we are applying those understandings to future developments on the Sequel System, including improvements to read length and reductions in input library amounts.
Kicking off the user presentations, Yahya Anvar from Leiden University Medical Center presented results from using SMRT Sequencing to study drug metabolism, specifically variants in the CYP2D6 gene. Anvar’s team has developed a CYP2D6 genotyping approach that enables his group to obtain high-quality, full-length, phased CYP2D6 sequences. According to Anvar, this leads to accurate variant calling and haplotyping of the entire gene locus, including exonic, intronic, and upstream and downstream regions. In addition to accurate characterization of variants within this locus, they can reliably describe copy-number changes, rearrangements, and gene conversions that have been missed by standard genotyping assays. He concluded that this method provides a powerful framework to infer drug response phenotype.
Christine Beck, a postdoctoral fellow at Baylor College of Medicine, discussed the use of target capture for complex genomic loci. The team uses targeted large-insert capture of human chromosome region 17p11.2 combined with long-read PacBio sequencing, which has allowed them to identify novel breakpoint junctional sequences in previously intractable repetitive DNA at this locus. She detailed the use of genomic approaches to characterize additional rearrangements of this structurally complex region and described mechanistic insights into genomic rearrangement formation that have been gleaned from these data.
Aaron Wenger, a senior staff research scientist at PacBio, spoke about improved support for long reads in the Integrative Genomics Viewer (IGV). New features include a quick consensus mode that suppresses random base-pair errors, quick phasing to group reads based on the nucleotide at a selected heterozygous variant, and labels for large insertions and deletions to reveal structural variants. He also presented examples of the extended IGV to explore haplotype phasing and structural variants in a human whole genome sequence. He noted that the development build of the viewer is available to download.
In Euan Ashley’s lab at Stanford, researchers are studying cardiac disease genes using SMRT Sequencing. Graduate student Alexandra Dainis presented their use of targeted Iso-Seq to phase cardiac disease genes. They were interested in using PacBio long-read sequencing because, unlike short-read sequencing, it can capture multiple SNPs or mutations on a single sequencing read and provide phased genetic information without the need for familial sequencing or inferential phasing from population data. Dainis discussed their work in hypertrophic cardiomyopathy, an autosomal genetic disorder that remains a leading cause of sudden death in young adults. Phasing disease-causing mutations may reveal disease-associated haplotypes that could be targets for new genetic therapies. The team has phased two sarcomeric genes (MYH7 and MYBPC3) in 10 left-ventricular heart RNA samples, from both controls and diseased hearts, and used this data to phase exonic disease-causing mutations and common SNPs into haplotypes for each sample. Their goal is to proceed to the development of new, haplotype-specific therapeutics.
Continuing the theme of human studies, Tina Graves-Lindsay from the McDonnell Genome Institute at Washington University School of Medicine spoke about plans to provide additional allelic diversity to the current human reference sequence by generating high-quality, highly contiguous human genome assemblies of individuals representing diverse populations. To date, they have sequenced seven diploid genomes. Their strategy involves generating deep coverage of PacBio sequence and scaffolding using optical mapping or cross-linking technologies to give even larger, chromosome-level information. This strategy also involves the use of large insert clone sequencing in targeted regions, which are typically not resolved in the whole genome assemblies.
Jason Underwood, who is both a principal scientist at PacBio and a senior fellow at the University of Washington, talked about the challenges posed by segmental duplications. An important source of genetic instability, they are associated with both rare and common diseases and can provide seeds for evolutionary innovation. UW used the Iso-Seq method to yield full-length transcript information and distinguish between gene copies with more than 99% sequence homology. Their approach uses complementary biotinylated oligonucleotide probes to enrich for duplicate genes from cDNA. They designed probes to 20 gene families that underwent duplications specifically on the human lineage since divergence from chimpanzee. Sequence analysis of captured cDNA from fetal and adult brain revealed mean transcript sizes ranging from 1,200 bp to 2,300 bp with transcripts up to 4 kb identified with high confidence. Among the human-specific duplications, they observed new isoforms, including novel sites of transcription initiation and polyadenylation, as well as previously unannotated open-reading frames, indicating that potentially novel human-specific brain mRNAs have previously been missed by short-read profiling.
The talks also included several studies of plants and animals. Stephen Mondo of the Joint Genome Institute focused on epigenetics, specifically N6-methyldeoxyadenine (6mA), which has only been found in four species: the alga Chlamydomonas reinhardtii and Drosophila melanogaster, C. elegans, and Mus musculus. Despite appearing at low levels, 6mA is critical for proper development, as it plays an important role in regulating gene expression. JGI scientists conducted the first kingdom-wide exploration of 6mA in fungi, where they found abundant utilization of 6mA in early diverging fungi, with up to 2.8% of all adenines methylated, vastly exceeding the levels observed in other organisms. Their results demonstrated the importance of 6mA as a broadly conserved epigenomic mark in eukaryotes and implicate 6mA as an epigenomic mark transmissible across nuclear division.
Amanda Larracuente from the University of Rochester talked about work in Drosophila looking at satellite DNA (satDNA), large blocks of tandem repeats that accumulate in heterochromatic genomic regions with low recombination, such as near centromeres and on Y chromosomes. Using SMRT Sequencing and multiple algorithms and parameter combinations to determine the optimal assembly approaches for heterochromatic regions rich in satDNA, they revealed the structure of complex satDNA loci with unprecedented resolution. These assemblies are providing a platform for evolutionary and functional genomic studies of satDNAs and other repeat-rich regions of the genome.
We also heard about work with wine grapes from Dario Cantu at the University of California, Davis. The genomes of the grapes and their microbial communities can shed light on beneficial organisms and how to avoid infestations that can kill these high-value crops. Deep sequencing of rRNA and metagenomes has allowed UC Davis to characterize the microbial communities in the vineyard, while whole-genome shotgun sequencing provided them with the references necessary to apply metatranscriptomics and profile gene expression of all interacting organisms simultaneously, including the grapevine host. The highly heterozygous genome of Cabernet Sauvignon was sequenced at 140x coverage with the PacBio RS II using a combination of 20 kb and 30 kb DNA libraries, producing an assembly with a contig N50 of 2.17 Mb. SMRT Sequencing was also used to sequence the genomes of some of the most common and economically important grape pathogens. For most fungal species, entire chromosomes were reconstructed into single-contig, telomere-to-telomere assemblies.
Tim Smith from the USDA Agricultural Research Service gave an update on work with the goat genome as a model for chromosome-scale assemblies. He pointed out that highly fragmented short-read assemblies impede downstream applications. That’s why their work for de novo assembly of the domestic goat (Capra hircus) is based on PacBio long reads for contig formation, short reads for consensus validation, and scaffolding by optical and chromatin interaction mapping. These combined technologies produced the most contiguous de novo mammalian assembly to date, with chromosome-length scaffolds and only 663 gaps. The assembly represents better than 250-fold improvement in contiguity compared to the previously published C. hircus assembly, and resolves many repetitive structures, including the most complete repeat family and immune gene complex representation ever produced for a ruminant species.
Our chief scientific officer, Jonas Korlach, capped off the day with a talk about how SMRT Sequencing is enabling a future of high-quality genomes, transcriptomes, and epigenomes. He said that scientific papers using SMRT Sequencing technology are being published at a rate of 25-30 per week, with more than ~650 so far this year. Now established as the gold standard for closing bacterial genomes, he also noted that there has also been an explosion in using SMRT Sequencing for methylation detection in bacteria, unleashing a new era in bacterial methylomes. He congratulated PacBio users who belong to the “1 MB Contig Club,” which now extends to characterizing transcriptomes. He also highlighted recent work on maize and sorghum as well as human genomes, where we’ve seen a number of high-quality assemblies from various ethnic populations. Korlach highlighted the differences between contigs and scaffolds, and how long strings of unknown bases in assemblies dramatically alter their utility.
We’d like to thank our host, Jodi Puglisi from Stanford, as well as the partners present at the event: Advanced Analytical Technologies, Computomics, Covaris, Diagenode, DNAnexus, PerkinElmer, and Sage Science.