A new publication in Genome Research shows how the use of SMRT Sequencing, in combination with other technologies, can reveal far more about repetitive DNA and structural variants than short-read sequencing alone. In this paper, scientists compared genome assemblies produced with short reads, long reads, and optical maps to understand the performance of each approach.
From Uppsala University, the University of Munich, and Bionano Genomics, the team studied the Eurasian crow for this project. The resulting paper, “Combination of short-read, long-read and optical mapping assemblies reveals large-scale tandem repeat arrays with population genetic implications,” comes from lead author Matthias Weissensteiner, senior author Jochen Wolf, and collaborators. They used an existing short-read assembly and generated a de novo PacBio long-read assembly and an optical map with Bionano Genomics, all from the same individual.
The PacBio-only assembly alone delivered a major improvement over the short-read assembly. Contiguity increased by almost 90-fold, with the long-read assembly featuring a contig N50 longer than 8.5 Mb. The SMRT Sequencing assembly also resolved more than 70 Mb of sequence missed in the short-read assembly, including nearly 16 Mb of repetitive elements.
The various assemblies were then compared and joined to determine how each source of information contributed to a final, high-quality genome resource. This step allowed the team to spot mis-assemblies, which occurred more frequently in the short-read assembly. They detected 43 mis-joins in the short-read assembly, and fewer than half that number in the long-read assembly.
One of the motivating factors for this project was an interest in understanding the repetitive DNA associated with constitutive heterochromatin, which has an influence on recombination. To that end, the team analyzed large tandem repeat arrays in the crow genome and used population resequencing data to estimate effects on recombination rate. “We characterized 36 previously unidentified large repetitive regions in the proximity of sequence assembly breakpoints, the majority of which contained complex arrays of a 14-kb satellite repeat or its 1.2-kb subunit,” the scientists report. They determined that the recombination rate was “significantly reduced in these regions.”
“Our results demonstrate the potential of combining independent technologies to discover previously inaccessible genomic features,” Weissensteiner et al. write. “With an emerging picture of genome architecture affecting the distribution of genetic diversity across genomes, the integration of large tandem repeat arrays into genome assemblies constitutes an important improvement.”
We’re delighted to see the release of another high-quality avian genome, which will support ongoing efforts in the B10K and G10K projects to represent as many species as possible.
April 11, 2017 | General