Nature biotechnology | 2019
Wenger, Aaron M and Peluso, Paul and Rowell, William J and Chang, Pi-Chuan and Hall, Richard J and Concepcion, Gregory T and Ebler, Jana and Fungtammasan, Arkarachai and Kolesnikov, Alexey and Olson, Nathan D and Töpfer, Armin and Alonge, Michael and Mahmoud, Medhat and Qian, Yufeng and Chin, Chen-Shan and Phillippy, Adam M and Schatz, Michael C and Myers, Gene and DePristo, Mark A and Ruan, Jue and Marschall, Tobias and Sedlazeck, Fritz J and Zook, Justin M and Li, Heng and Koren, Sergey and Carroll, Andrew and Rank, David R and Hunkapiller, Michael W
The DNA sequencing technologies in use today produce either highly accurate short reads or less-accurate long reads. We report the optimization of circular consensus sequencing (CCS) to improve the accuracy of single-molecule real-time (SMRT) sequencing (PacBio) and generate highly accurate (99.8%) long high-fidelity (HiFi) reads with an average length of 13.5?kilobases (kb). We applied our approach to sequence the well-characterized human HG002/NA24385 genome and obtained precision and recall rates of at least 99.91% for single-nucleotide variants (SNVs), 95.98% for insertions and deletions <50 bp (indels) and 95.99% for structural variants. Our CCS method matches or exceeds the ability of short-read sequencing to detect small variants and structural variants. We estimate that 2,434 discordances are correctable mistakes in the 'genome in a bottle' (GIAB) benchmark set. Nearly all (99.64%) variants can be phased into haplotypes, further improving variant detection. De novo genome assembly using CCS reads alone produced a contiguous and accurate genome with a contig N50 of >15?megabases (Mb) and concordance of 99.997%, substantially outperforming assembly with less-accurate long reads.