Microorganisms form the foundations of life on Earth. They shaped our planet long before we were here, and continue to shape our environments and our lives.
Understanding microbial diversity is crucial not only for conserving and maintaining ecosystems, but also for human health. Because these unique organisms are often difficult to grow in the lab, many of them have eluded scientists — until recently. Today, sequencing technology like the Revio system are allowing microbiologists to sequence microbes directly from the environment, without the need to culture.
“In the past few years, we have been experiencing another step-up of the growth rate of available microbial genomes, but this time the driving factor is … the advent of the metagenome-assembled genome, or MAG, made possible by metagenomics and associated bioinformatics.”1
The metagenome-assembled genome (MAG) is among the latest developments in the field of microbiology. Here, we’ll explain the basics of MAGs, why they’re important, and how to create and use them.
What is a metagenome-assembled genome (MAG)?
A metagenome-assembled genome (MAG) is a species-level microbial genome constructed from community-level metagenomic data from a microbiome sample. MAGs are a powerful tool for cataloging microbial diversity, especially for non-culturable microorganisms. They have successfully been used to identify novel species, and study remote or complex environments such as soil, water, or the human gut.
Why are metagenome-assembled genomes important?
It’s well-established that the majority of prokaryotic life (a whopping 99%) is difficult or impossible to culture in the lab, leaving us to only guess at the microbial “dark matter” hidden in the planet’s tiniest organisms.2 Now, advances in sequencing technology and bioinformatics are helping microbiologists overcome that “unculturable” hurdle, with culture-independent methods to study microbial diversity.
Metagenome-assembled genomes allow us to better understand these unculturable microbial populations, in environments ranging from the human gut to active volcanoes.3 Our understanding of these microbes has implications for human health, drug discovery, environmental remediation, epidemiology, and more. Creating and applying MAGs has the potential to dramatically accelerate the pace of biodiversity discovery.
How are metagenome-assembled genomes created?
At a high level, metagenome-assembled genomes are created by first assembling sequencing reads and then binning the results. During the assembly stage, sequencing reads are stitched together to create contiguous fragments, or contigs. They are then binned, or organized into groups according to patterns that indicate which contigs belong to the same genome. Each resulting bin corresponds to a MAG.1
This two-step process sounds deceptively simple, but in reality, it’s a complex undertaking to sort contigs into genomes without a reference. There are several challenges associated with metagenome assembly, including:
- The presence of multiple species
- Uneven and unknown species abundances
- Conserved genomic regions shared across species
- Strain-level variation within species
Highly accurate long reads provide major advantages for metagenome assembly, with the length and accuracy needed to achieve species-and strain-level resolution even in highly mixed samples. HiFi reads and metagenome assembly algorithms are making strides in helping to address the challenges above.
What is the best technology for generating metagenome-assembled genomes?
Long-read sequencing can overcome many of the challenges previously associated with metagenome-assembled genomes.
With traditional short-read sequencing, the contigs produced by metagenome assembly still only represent fragments of genomes. In contrast, long-read sequencing makes it possible to get one MAG from just one contig, because the reads are so long they can span a whole microbial genome. Short-read contigs rarely produce whole genomes, and they rely heavily on binning methods, which can introduce further errors.
Several studies have demonstrated that PacBio HiFi sequencing produces more total MAGs and higher quality MAGs than short-read sequencing.4—10 The difference between these two technologies is essentially the difference between draft, error-prone MAGs and reference-quality MAGs.
While it’s clear that long reads are better for metagenome assembly than short reads, which long-read technology performs best? When it comes to MAGs, the answer is clear. Compared to nanopore sequencing, studies have clearly shown that HiFi sequencing comes out on top for metagenome assembly.6,11 HiFi reads typically span up to 25 kb, with 99.9% accuracy, making single-contig complete genomes possible.
In a new preprint titled “Highly accurate metagenome-assembled genomes from human gut microbiota using long-read assembly, binning, and consolidation methods,” researchers used HiFi sequencing to generate metagenome-assembled genomes from a pooled human gut microbiome.12 This study, and the bioinformatics methods within it, resulted from a collaboration between Dan Portik of PacBio and scientists at Zymo Research, Phase Genomics, and the BioCollective, as well as several academic researchers.
“Our study demonstrates that metagenome assembly with HiFi reads can produce large numbers of highly complete MAGs, corroborating the findings of previous studies.” 12
In this study, the authors describe the creation of HiFi-MAG-Pipeline, a new workflow for metagenome assemblies. They also developed a new algorithm, pb-MAG-mirror, for comparing the resulting MAGs from two binning approaches. The HiFi-MAG-Pipeline and all PacBio metagenomics pipelines are available on Github.
“Overall, we find the use of HiFi sequencing, improved metagenome assembly methods, and complementary binning strategies is highly effective for rapidly cataloging microbial genomes in complex microbiomes.”12
Interested in more details about the MAG pipeline? Read our in-depth article and explore some example datasets.
Tomorrow’s metagenomics breakthroughs begin today
Highly accurate HiFi sequencing and new analysis approaches are changing the game in metagenomics, equipping researchers with ever-more powerful tools to push new discoveries forward. We encourage you to see for yourself how HiFi sequencing and PacBio microbial genomics solutions can help you generate hundreds of high-quality metagenome-assembled genomes, many of which are single contig, circular MAGs.
Curious about the realities and practical applications of metagenome-assembled genomes? Start with this webinar, where leading scientists from PacBio and Zymo Research present state-of-the art HiFi metagenomic sequencing solutions that will enable researchers to make important discoveries that are not only high-impact, but also more precise, reproducible, and resource-efficient than ever.
Watch the webinar: Metagenome assembly and characterization of a pooled human fecal reference
Are you ready to try HiFi?
References:
- Setubal JC. 2021. Metagenome-assembled genomes: concepts, analogies, and challenges. Biophys Rev, 13, 905–909.
- Rinke C, Schwientek P, Sczyrba A, et al. 2013. Insights into the phylogeny and coding potential of microbial dark matter. Nature 499, 431–437.
- Wilkins LGE, Ettinger CL, Jospin G, et al. 2019. Metagenome-assembled genomes provide new insight into the microbial diversity of two thermal pools in Kamchatka, Russia. Sci Rep 9, 3059.
- Priest T, Orellana LH, Huettel B, Fuchs BM, and R Amann. 2021. Microbial metagenome-assembled genomes of the Fram Strait from short and long read sequencing platforms. PeerJ, 9: e11721.
- Gehrig JL, Portik DM, Driscoll MD, Jackson E, Chakraborty S, Gratalo D, Ashby M, and R Valladares. 2022. Finding the right fit: evaluation of short-read and long-read sequencing approaches to maximize the utility of clinical microbiome data. Microbial Genomics, 8: 000794.
- Meslier V, Quinquis B, Da Silva K, Plaza Onate F, Pons N, Roume H, Podar M, and M Almeida. 2022. Benchmarking second and third-generation sequencing platforms for microbial metagenomics. Scientific Data, 9: 694.
- Eisenhofer R, Nesme J, Santos-Bay L, Koziol A, Sorenson SJ, Alberdi A, and O Aizpurua. 2023. A comparison of short-read, HiFi long-read and hybrid strategies for genome-resolved metagenomics. bioRxiv, doi:10.1101/2023.10.04.560907
- Orellana LH, Kruger K, Sidhu C, and R Amann. 2023. Comparing genomes recovered from time-series metagenomes using long- and short-read sequencing technologies. Microbiome, 11: 105.
- Tao Y, Xun F, Zhao C, Mao Z, Li B, Xing P, and QL Wu. 2023. Improved assembly of metagenome-assembled genomes and viruses in Tibetan saline lake sediment by HiFi metagenomic sequencing. Microbiology Spectrum, 11: e03328–22.
- Zhang Z, Yang C, Veldsman WP, Fang X, and L Zhang. 2023. Benchmarking genome assembly methods on metagenomic sequencing data. Briefings in Bioinformatics, 24: 1–17.
- Sereika M, Kirkegaard RH, Karst SM, Michaelsen TY, Sorensen EA, Wollenberg RD, and M Albertsen. 2022. Oxford Nanopore R10.4 long-read sequencing enables the generation of near-finished bacterial genomes from pure cultures and metagenomes without short-read or reference polishing. Nature Methods, 19: 823–826.
- Portik DM, Feng X, Benoit G, et al. 2024. Highly accurate metagenome-assembled genomes from human gut microbiota using long-read assembly, binning, and consolidation methods. bioRxiv, 05.10.593587