Scientists at the Boyce Thompson Institute, Cornell University and the USDA Agricultural Research Service have reported significant progress in understanding the genomic features of domestic and wild apples. They used HiFi reads, highly accurate long reads, generated by the Sequel II System to build phased, diploid genome assemblies, as well as apple pangenomes to represent more of the remarkable genetic diversity in this lineage and better characterize its historic domestication.
The paper, published in Nature Genetics, comes from lead authors Xuepeng Sun (@XuepengBio), Chen Jiao, and Heidi Schwaninger; senior author Zhangjun Fei (@fei_lab), and collaborators.
We asked Fei about the highlights of the team’s efforts, and here’s how he summed it up: “We assembled phased diploid genomes of modern apple (Malus domestica) and the two major wild progenitors, M. sieversii and M. sylvestris using PacBio HiFi reads and Illumina short reads, and constructed pan-genomes of the three species. We inferred the genetic contributions of the two wild progenitors to the cultivated apple, and identified genome regions under selection during apple domestication and associated with important traits such as fruit size, texture and taste.“
The team focused on the tasty Gala apple, knowing that producing an accurate genome assembly would require more than short-read sequencing data.
“Most crop plants have complex genomes characterized by large size, high heterozygosity level and polyploidy,” they write in the paper. “The apple genome is highly heterozygous, posing a major challenge for earlier genome assemblies.”
To address those challenges, the scientists incorporated HiFi reads into their strategy, sequencing the Gala apple and its wild progenitors at coverages ranging from 37-fold to 81-fold. These HiFi reads were then assembled using hifiasm and HiCanu, respectively (read more about these and other options for HiFi assemblers in this blog post). Those results were merged with orthogonal data sets to create diploid genomes for each of the three apples, with final assemblies reaching about 1.3 Gb.
“Despite high heterozygosity rates (0.85–1.28%), all assemblies showed high contiguity, with the scaffold N50 of 3.3–4.3 Mb in diploid assemblies and 16.8–35.7 Mb in haploid consensuses,” they add.
The extremely high quality of the final assemblies allowed the scientists to identify an error in previously published apple genomes associated with a 5 Mb inversion on Chromosome 1.
But the team also wanted to go beyond just one high-quality assembly for the Gala apple, pointing out that “a single reference genome can by no means represent a whole population.” To that end, they constructed a pangenome for each of the three apple types, using 91 accessions to capture natural genetic diversity. Through this work, they added between 89 Mb and 212 Mb of novel sequence data to each genome, covering thousands of new genes.
“Unlike annual crops such as the tomato, the pan-genome size of the cultivated apple is larger than that of wild progenitors, possibly due to the outcrossing nature and extensive introgression from wild species,” Sun et al. write. “This distinctive feature suggests that introgression of new genes/alleles is possibly a hallmark of crops domesticated through hybridization.”
One of the most important motivations for this study was to support apple breeding programs through a deeper understanding of trait variability.
“Traits introgressed in the hybrid are often not fixed and could be lost when propagated by seeds,” the authors note. “Understanding of the molecular basis of trait variability, which requires the knowledge of the diploid alleles, is critical for fixation of desirable traits in apple breeding.”
See additional examples of the use of SMRT Sequencing for the generation of pangenomes:
- Webinar Summary: Crops and Corvids get the Pangenome Treatment with HiFi Sequencing
- Pangenome of Soybean Generated to Capture Genomic Diversity
- Project to Rapidly Sequence Maize Pangenome Delivers Publicly Available Resource
- Sequencing 101: Looking Beyond the Single Reference Genome to a Pangenome for Every Species
- Case Study: Pioneering a Pan-Genome Reference Collection