We’ve been in the genomics world long enough to remember when it was a big deal to see a great single-gene assembly or microbial genome assembly reported in an AGBT talk. It’s really something to attend this year and see some beautifully assembled whole human genomes.
Several of the Friday talks really captured our interest, but we can only cover a couple of them here. NCBI’s Valerie Schneider spoke about efforts through the Genome Reference Consortium to improve assembly of the human reference genome, noting that one challenge has been the shift from a clone-based approach during the Human Genome Project to whole-genome sequences today. While these new sequences are adding tremendously valuable information to the reference assembly and are shaping how it is curated, she said, they also introduce different assembly issues that have to be reconciled with existing information.
Schneider noted that considerable improvements have occurred for highly repetitive regions, such as the mucin genes. SMRT Sequencing has made it possible to fully resolve many of these regions, which had long appeared intractable. She also presented recent work on the CHM1 and CHM13 hydatidiform moles, which have haploid human genomes that are helping make sense of some complex regions in the assembly thanks to long-read sequencing. Schneider illustrated the challenges of choosing which sequence to add to the reference when she presented a number of quality metrics indicating that some assemblies were better for, say, contiguity, while others were better for QV score. “No one assembly is excelling for all metrics,” she said.
During another talk, Karyn Meltz Steinberg from the McDonnell Genome Institute at Washington University reported the first African reference genome assembly, a Yoruban sample analyzed with 70x coverage of SMRT Sequencing data. She told attendees that the best strategy to achieve a gold-quality genome is to use PacBio sequencing, which offers a vast improvement over short-read approaches. The team used a BioNano Genomics genome map to add extremely long-range scaffold information, boosting the already impressive contig N50 of 6 Mb to a scaffold N50 of nearly 15 Mb.
Also in the informatics session, Maria Nattestad from Cold Spring Harbor Laboratory presented an algorithm called SplitThreader for analyzing highly amplified or rearranged cancer genomes. Inspired by examples like a commonly used Her2-amplified breast cancer cell line, which has a full complement of 80 chromosomes, the SplitThreader algorithm analyzes complex events to find the most likely evolutionary path that created them. With PacBio sequencing data, the tool was able to uncover and visualize new candidate fusion genes.
In human microbiome work, Gregory Buck from Virginia Commonwealth University presented data from two projects designed to elucidate the profiles of vaginal microbial communities by studying thousands of women. This particular microbiome may be associated with preterm birth, HIV risk, and more. Buck noted that some of the microbes discovered have been sequenced with PacBio systems to produce remarkably high-quality, fully closed assemblies in very little time. The projects have identified 20 vagitypes, or typical microbial community profiles, some of which appear to be influenced by genetics and ancestry.
We recorded some of the AGBT talks this year, and will be making those videos available on the blog shortly. Stay tuned!
February 19, 2016 | General