PacBio HiFi sequencing has been utilized to generate reference-quality genomes in many studies, most notably in finally completing a human genome. Following this work, the Human Pangenome Reference Consortium (HPRC) is now generating over 300 reference-grade genomes to fully capture the genetic diversity across the human population.
We are seeing this paradigm shift towards accurate, complete, and diverse reference genomes based on HiFi reads happening for many other species. As another recent example, a preprint entitled The Chlamydomonas Genome Project, version 6: reference assemblies for mating type plus and minus strains reveal extensive structural mutation in the laboratory by researchers from UC Berkeley and collaborators, used PacBio HiFi genome sequencing and the Iso-Seq method to provide a major upgrade of the reference genome of the unicellular green alga Chlamydomonas reinhardtii.
Chlamydomonas reinhardtii is one of the primary model organisms in plant and cell biology, having been studied in the contexts of many fundamental biological processes, such as photosynthesis, the cell cycle, and sexual reproduction. It is utilized for many applications in the growing field of algal biotechnology. It was also the first alga subjected to a genome project. Five versions of the Chlamydomonas reinhardtii reference genome have been produced over the last two decades. In the new preprint, the authors present version 6, “bringing significant advances in assembly quality and structural annotations”, and including:
- Chromosome-level assemblies for two laboratory strains, provide separate reference-quality genomes for both mating type alleles
- Correction of major misassemblies in previous versions
- Contiguity increase of over ten-fold
- >80% of the filled gaps are within genes
- Improved structural annotations, updated gene symbols and annotation of functionally characterized genes via extensive curation
The authors concluded:
Based on their findings related to major structural mutations in these assembled genomes, the authors caution that all laboratory strains are expected to harbor gene-disrupting mutations, “which should be considered when interpreting and comparing experimental results across laboratories and over time.” In addition, a single strain does not represent the genomic diversity present among Chlamydomonas laboratory strains, as the “two haplotypes differ at ~2% of sites, which is approximately equivalent to the average genetic diversity between any two field isolates from the same location”. For this reason, a pan-genome project has already been started, “aiming to produce genomes for several additional laboratory strains and field isolates” The authors note that “many insights can only be gleaned by comparing the genomes of different strains, and we can expect substantial benefits from sequencing additional strains and isolates moving forward. With respect to the two laboratory haplotypes, a ‘laboratory pan-genome’ could be
produced …, capturing all ancestral variation present among laboratory strains.”
Thanks to the combined accuracy and read length of HiFi sequencing and robust, efficient assembly methods, the era of reference-quality pan-genomes is now a reality. If you are wondering about the complete genome sequence of your favorite organism, and how the different strains that you are working with in your lab truly differ genome-wide, generating references of all the strains and species of interest are now greener pastures, and not just for this alga.
Contact us to discuss how you can produce reference pan-genomes for your studies as well.
Read more about the HiFi Difference
More precise genomes for precision medicine
Not all gigabases are created equal
Haplotype phasing in genome assembly
True long reads vs. synthetic long reads
For more information on plant and animal genomics, go here.