Nearly gapless, reference-quality chromosome-level assemblies — in less than a day? Yes, it’s possible, thanks to the high accuracy and low computational needs of PacBio HiFi reads.
Kevin Fengler, computational genomics lead at Corteva Agriscience, welcomed watchers to the brave new world of the pangenome during the recent webinar, “Beyond a Single Reference Genome – The Advantages of Sequencing Multiple Individuals.”
We are now living in an era where you can generate a reference genome assembly that’s specific for each application or trait of interest, Fengler said.
“Often we’re interested in getting the sequence of a single disease resistance gene or the sequence of a particular QTL, and we’ll do a whole genome just for that,” Fengler said. “It may seem like overkill, but we have found that the best approach — the fastest, easiest, simplest, most cost effective way to do that — is just to generate whole genome reference quality assemblies.”
Fengler cited several benefits of HiFi reads that have made this possible. Foremost among them were lower computational demands, and high accuracy with a low error rate, even with relatively “short” reads.
“I used to be a long-read junkie, always trying to get 50 kb, 60 kb reads. But with reads that are only 15 kb in length, we’re now able to achieve these highly contiguous, highly accurate assemblies,” Fengler said. “The HiFi reads are so accurate that you don’t need to do any additional polishing with Illumina, or even additional polishing with PacBio, which used to be a step.”
In many cases, Fengler said he has been about to assemble through the centromere.
“This is another cool thing that’s developed with HiFi. With our previous CLR assemblies, we never would have assembled through the centromere for plants, but now we are able to get a single scaffold per genome.”
Fengler emphasized that pangenome assemblies need to be robust, “because misassembly is not SV, and sequencing error is not variation.”
He shared several examples of crops that have received the pangenome treatment, including cotton, which, he noted, “would not be considered, historically, an easy genome to assemble by any means.”
“But here we’re getting single contigs in most cases for most of the chromosomes,” he said. “This is what you really need. This is the goal, this is what we’re trying to achieve.”
He also discussed two of the tools he uses to analyze the sequence diversity between the genomes and make it actionable, TagDots and PANDA.
Watch Fengler’s full presentation:
Crossing a continental crow divide
Pangenome collections are not only valuable for comparing commercial crop breeds for certain traits, they can also help answer questions about the evolution and population dynamics of non model species.
Matthias H. Weissensteiner (@MWeissensteiner), a postdoc at Pennsylvania State University, discussed his work studying structural variation among several songbird species in the genus Corvus, some of which were included in his recently published Nature Communications paper.
About 60 species of the genus display the typical all-black crow plumage pattern, but there are also a few black-and-grey and black-and-white forms. In Europe, there is a ‘crow divide,’ with all-black crows in the west, black-and-grey crows in the east, and a narrow hybrid zone in between.
“They look like two species, they behave like two species, but when we looked at the genetic differentiation based on single nucleotide changes, we found out that they are actually genetically more or less the same,” Weissensteiner said. “Only 83 nucleotides out of a genome of 1.3 billion base pairs are fixed, meaning that there are only about 80 differences which are diagnostic for these two crow populations.”
In order to uncover the secrets of their speciation, Weissensteiner and colleagues sequenced 33 crows and created a dataset comprising the full phylogenetic range of the genus.
For Weissensteiner, the value of long reads was clear: By enabling him to anchor his reads completely to the reference, he could more confidently capture the correct sequence and identify insertions, deletions, inversions and other variations.
“We combined different types of sequencing and mapping technologies and found that long-read sequencing in particular is able to reveal a stunning amount of genetic variation.”
Having assemblies from across the entire genus made filtering the data a bit easier, as well, Weissensteiner said. Because the researchers had large phylogenetic distances within their data, they were able to remove variants that seemed to be segregating across the clades.
“If you have a variant that is polymorphic within the crow clade and polymorphic within the jackdaw clade, it’s likely to be an error because over these large phylogenetic distances, there should not be any segregating variation,” he explained.
Once he had a reliable set of variants, Weissensteiner looked for causal mutations for plumage differentiation and identified the most promising candidate: A 2.25 kb LTR retrotransposon insertion located 20 kb upstream of the NDP gene.
Watch Weissensteiner’s full presentation:
See additional examples of the use of SMRT Sequencing for the generation of pangenomes:
- Pangenome of Soybean Generated to Capture Genomic Diversity
- Project to Rapidly Sequence Maize Pangenome Delivers Publicly Available Resource
- Sequencing 101: Looking Beyond the Single Reference Genome to a Pangenome for Every Species
- Case Study: Pioneering a Pan-Genome Reference Collection
- Video: Dawn of the Crop Pangenome Era