What does genomics in action look like? At PAGBio Day 2022, we heard from scientists using high-quality genomic data to inform agricultural breeding programs, enhance endangered species monitoring, and fill the gaps in the evolutionary Tree of Life.
Hundreds of scientists from around the world gathered online to hear more about these and other genomic advances in agriculture, conservation and ecology enabled by HiFi sequencing.
Here are some of the highlights:
News from leadership
When it comes to kicking off an event, listening to the State of the Sequencing update from PacBio leaders is a big draw. CEO and President Christian Henry did not disappoint, giving participants a peek into recent and upcoming developments at the company.
Referencing recent partnerships and acquisitions, Christian noted that 2021 was among the most productive in PacBio history, and that the company is now poised to become the leading provider of both highly accurate long-read and short-read technology.
The addition of Circulomics to the PacBio portfolio extends extraction and sample prep capabilities, with Nanobind disks that protect DNA to generate higher-quality reads across a broader range of sample types.
“This is particularly useful in the plant and animal community, as many species require very delicate handling of the DNA. Circulomics enables us to extract very high-quality DNA which then generates high quality sequencing.”
PacBio’s “sequencing by binding,” a differentiated short read technology, is expected to be available to users in the first half of 2023, Henry revealed. And several improvements to the Sequel IIe system will also be introduced, including SMRTbell prep kit 3.0, Template binding kits 3.1/3.2, SMRT Link v11.0. These are expected to increase efficiency and usability, with 40% less DNA input required, a 30% reduction in tubes and other consumables, and a 60% reduction in hands-on time.
Another big development is easier methylation calling, directly on the instrument, Henry said.
“What this means is that you’ll be able to do epigenetic studies directly from every single sequencing run,” he added. “We think this is a very important breakthrough in a number of different applications and markets, and we’re excited to bring it to the community.”
Highlighting HiFi sequencing as a key to conservation
A self-described superfan of big cats and other “charismatic carnivores,” Washington State University (WSU) postdoc Ellie Armstrong (@_ellie_cat) made a great case for the vital role HiFi sequencing can play in conservation.
She started with some sobering stats about the drastic decline in North American brown bears (from ~50,000 in the lower 48 states to less than 1,000 in 1975) and their shifting habitats (they now occupy just 3% of their historic range).
Although the brown bear conservation community has been at the forefront of using cutting-edge technology to aid in their efforts, very few genomes have been sequenced across the bear’s range, with just two samples from the lower 48 states.
Genomic information can be used to help monitor and track animals through ecosystems, infer relationships between individuals, identify genetic adaptation and fitness over time to see who is surviving better, determine inbreeding and inbreeding depression, and find the source of illegal trading products. All of this information can inform management plans and how funds are allocated among different species and populations, Armstrong said.
“But the success of answering these questions really relies on large, comprehensive databases with geo-referenced individuals so that we understand how these genetic patterns relate to what is going on in the wild,” Armstrong added.
She argued for more efforts to fill in sequencing gaps across species, starting with high-quality reference genomes.
At WSU, her team has created a phased genome with vastly improved contiguity– 1,145 contigs with 44 Mb contig N50 compared to 16,555 contigs with 532 kb contig N50 of a previously published brown bear assembly, an 82-fold increase in contiguity.
Armstrong had also worked on a SMRT Grant winning project to upgrade the reference genome for leopards. There is very little information available to conservationists due to the extremely elusive nature and large range of the animal, which makes it difficult to track in the wild. And although it has the largest subspecies of all big cats (9), there is only one reference genome available, created from short reads.
Using biobanked samples from the Maryland Zoo, Armstrong and colleagues were able to create high-quality assemblies from two animals, with a 617-fold improvement in contiguity, going from a 40.5 kb Contig N50 to a 25 Mb Contig N50.
“When we ask the right questions and form interdisciplinary teams with diverse groups of scientists, genomics can play a powerful role in the conservation toolkit.” Ellie Armstrong, WSU
Sweet successes in strawberries
Imagine having to rely on diploid reference genomes for an octoploid species? That’s what the strawberry breeding community faced prior to the publication of the Camarosa reference genome in 2018.
In this key speech, UC Davis postdoc Mitchell Feldmann (@MitchFeldmann) told PAGBio that this was “problematic.”
“This is a highly repetitive genome, not in the way that cattle or humans or maize are, but in every genome there are potentially eight copies of every single gene,” Feldmann said. “And you’re kind of forced into this octoploid dosing model of doing quantitative genetics,” he said.
While imperfect, the Camarosa reference genome enabled Feldmann and his colleagues to build multiple high quality genotyping arrays and make genetic discoveries related to disease resistance, aroma, flowering, etc.
Three years later, HiFi sequencing allowed the UC Davis team to create a greatly improved genome for UCD Royal Royce, published as a preprint in December 2021.
“The quality of the genome shocked us the first time we saw it,” Feldmann said. “All of the phased subgenomes were assembled better than the previously published reference genome. And the entire thing was done with a single SMRT Cell, decreasing the cost while increasing data density.”
See what he was able to learn about strawberry biology, and how the project could serve as blueprint for phasing and assembling the genomes of heterozygous polyploids.
Back to basics
Next, Ted Kalbfleisch of the University of Kentucky took things back to the basics by providing an essential overview of the history of genomics and the state of sequencing today.
In addition to a great overview of the evolution of both short and long-read sequencing, he posed some basic yet thought-provoking questions about democratizing genomics.
Q: Why do we build genomes?
A: We need context… context in which to better understand gene structure, and the impact of variation and epigenetic changes on it, so that we can ultimately make biological interpretations.
Q: Can anyone who needs a high-quality reference genome get one?
A: For the most part, yes. Sequencing services are readily available and affordable, as are informatics tools. Annotation processes and pipelines are a bit of a bottleneck, at the moment, but the Iso-Seq method has been a game changer, and decentralized annotation models would help tremendously.
“To the degree that you have a species that hasn’t been shown a great deal of love by any of the larger genome consortium projects and you need to understand exactly what the genome of that species is, with modest funding you can probably find the people and the data to be able to get you where you need to go,” Kalbfleisch said.
And to round things out, PacBio experts were on hand to give first-hand tips and tricks in a series of talks:
● Stories of genome annotation using the Iso-Seq method
While there are relatively fixed sets of recipes for doing genome assemblies using PacBio HiFi data, genome annotation and QC of plant and animal genomes is still evolving. There are many methods used to annotate their genomes, and PacBio Iso-Seq expert Liz Tseng (@Magdoll) highlighted what she considers the “latest and greatest” ones, citing user examples.
● Resolving microbial communities in ecosystems
Looking to study microbial communities in soil, or nutrient uptake in rice paddies? Jeremy Wilkinson leads an overview of how HiFi reads can be applied in microbiome and metagenomic studies. As he points out, HiFi reads are uniquely well suited to metagenome assembly, because you can skip the error correction step required by other technologies, allowing you to better resolve closely related strains.
● An evaluation of bioinformatics tools for HiFi plant pangenomes
One reference genome is often not enough, and researchers are increasingly turning to pangenomes to better represent plant and animal species. But how does one do so with limited bioinformatics training and a modest budget? Greg Concepcion (@phototrophic) gave a handy evaluation of bioinformatic tools, including an in-depth review of the pangenome graph builder pipeline, which he used alongside Oregon CBD to construct a pangenome graph of cannabis and extract regions of interest for further exploration.
Learn more about whole-genome sequencing, metagenomics, and genome annotation by visiting the new Plant + Animal genomics page