Scientists from Rutgers University and the University of California, Davis, used SMRT Sequencing to study structural variation in maize. They found that this approach delivered more complete information at lower cost than standard methods and generated new findings that could be important for crop breeding.
From lead author Jiaqiang Dong, senior author Jo Messing, and collaborators, “Analysis of tandem gene copies in maize chromosomal regions reconstructed from long sequence reads” was published in PNAS recently. They chose to evaluate SMRT Sequencing for copy number detection as an alternative to short-read sequencing, which doesn’t span long repeats, and BAC cloning, which is prohibitively expensive. “The single most critical parameter is the length of each sequence read to establish overlaps without the need of genomic clone libraries,” the authors write. “Therefore, we tested the new SMRT technology to determine whether we could assemble chromosomal regions from one shotgun DNA sequencing dataset that would comprise large tandem gene copies.”
They chose maize because of its high proportion of repetitive sequence — repeats make up a remarkable 85% of its genome — and focused on the alpha zein gene family. Spread across six chromosomes, the gene family is important because it “acts as a sink for reduced nitrogen in the seed,” the authors explain. In other maize strains, as many as 48 copies of these genes were found.
The team notes that the average read length generated by the PacBio System was “26 times longer than Illumina and 8 times longer than ABI3730, providing us with significantly more contiguous information for shotgun DNA sequence assemblies.” This long-read data enabled the comprehensive genomic picture that the scientists were hoping for: “Based on this high-quality single shotgun DNA sequencing dataset, we were able to use zein gene sequences as digital probes to assemble the entire collection of orthologous regions from [the W22 strain],” they report. A detailed analysis demonstrated that the self-corrected SMRT Sequencing data had an error rate of less than 0.1%.
The use of SMRT Sequencing proved useful “for resolution of large complex repeats or tandem/dispersed gene family clusters,” the scientists conclude. “Given the effectiveness of this approach in maize, we anticipate that it will be of general use with any complex genome including human and, in particular, cancer genomics, where structural changes can be dramatic.”
August 2, 2016 | General