In a new preprint, entitled Realizing the promise of biodiversity genomics with highly accurate long reads, a group of researchers from Utah State, BYU, the LOEWE Centre for Translational Biodiversity Genomics in Germany, the University of Utah and the Smithsonian Institution evaluate how different types of sequence data influence genome assembly. Noting that long-read sequencing initially had a primary focus on providing better genome contiguity, there is now an increasing attention paid to how complex, repetitive regions are assembled, ideally in a haplotype-phased manner.
To investigate how accurate long HiFi reads compare to more error-prone sequence reads, they first compare the de novo assemblies of a caddisfly insect from HiFi or ONT data, respectively. They observed that “despite shorter reads and less coverage, HiFi reads outperformed ONT reads in all assembly metrics tested”. The HiFi-based assembly had dramatically higher contiguity (11.2 Mb, vs. 0.7 Mb for ONT), as well as greater completeness (95.6% vs. 93%). In addition, the researchers used the long and repetitive H-fibroin gene as a surrogate for complex but phenotypically important genes, finding that the gene was correctly resolved in the HiFi assembly, annotating to a single gene with a single intron in the N-terminus region and a second large exon (25.3 kb) with a well-resolved repetitive structure, “giving high confidence in the accuracy of the assembly”. In contrast, the corresponding region in the ONT assembly “annotated a dozen genes in the ∼30Kb region, most of which did not include the characteristic repeats known from previous data.”
Next, the researchers investigated whether these differences might extend to all animals and plants via a field-wide meta-analysis, by quantifying the influence of data type on genome assemblies across 6,750 plant and animal genomes. They find that “HiFi reads consistently outperform all other data types for both plants and animals and may represent a particularly valuable tool for assembling complex plant genomes.” Based on these findings, the researchers concluded:
We are a proud partner of numerous biodiversity genomics initiatives, including the Earth BioGenome Project, the Darwin Tree of Life Project, the Vertebrate Genome Project, the African BioGenome Project, the European Reference Genome Atlas, the Ag100Pest Initiative (recently announcing the assembly of the desert locust, nearly 3 times the size of the human genome and “one of the largest insect genomes ever completed”), and many more. We look forward to connecting with you to assist making HiFi-based accurate, complete, contiguous and haplotype-resolved plant and animal genome assemblies the cornerstone for your biodiversity research.