Since the first PacBio instrument was released in 2011, methylation detection has been one of the advantages of SMRT Sequencing. The kinetics of nucleotide incorporation change as the DNA polymerase moves across a methylated position on the DNA template strand, producing distinctive perturbation patterns (Figure 1) that can be recognized by methylation-calling software.
With the advent of a simple method for detecting methylation in prokaryotes, researchers have demonstrated that in addition to functioning as a defense against phages, bacterial R-M systems can also drive important traits like antibiotic resistance, immune evasion, virulence and persistence in hosts.
Recent internal validation work has confirmed that detection of m6A and m4C in prokaryotic DNA and the R-M system target motifs they reside in continues to perform robustly on the Sequel II System. The detection of 5mC continues to require significantly higher coverage and is therefore not supported through the SMRT Analysis ‘Base Modification Analysis’ workflow.
Our initial validation was done on E. coli K, sequenced as part of a 48-plex sequencing run on the Sequel II System (Figure 2). All three known m6A motifs were successfully detected. In addition, the high coverage weakly detected the known target of the Dcm m5C methylase, CCWGG. However, since m5C calling is not supported, it was erroneously tagged as m6A.
An important takeaway is that to obtain the cleanest motif-finding result, the ‘Minimum Qmod Score’, available as an advanced parameter in the ‘Base Modification Analysis’ application in SMRT Analysis, had to be increased manually. As shown by the red arrow in Figure 2, this value should be set such that it excludes most baseline noise while fully including the cloud of methylation signal. In this example, the ideal setting is Qmod = 200. While the optimal value of Qmod changes with sequencing coverage, we have found a value of 100 produces a good result in most cases when sequencing 48 microbes per SMRT Cell 8M.
To better assess performance across the full range of methylation patterns seen in microbes, we then analyzed data from 4 more challenging microbes. These more difficult examples confirm that the Sequel II System can detect both m6A and m4C at the same level of performance seen with our previous sequencing systems. The known R-M systems in Neisseria meningitidis FAM18 (Table 1), Treponima denticola A (Table 2), and Methanocorpusculum labreanum Z (Table 3) were largely recovered at high confidence. The few exceptions are likely due to competition between multiple methyltransferases that target overlapping motifs.
The most difficult test case was H. pylori J99, which carries 24 distinct R-M systems, targeting m6A, m4C, and m5C. We called 21/24 motifs precisely correctly. In one instance our motif caller was confounded by overlapping motifs, but the correct answer could be easily discerned by visual examination. The remaining two missed motifs involve m5C, which continues to be unsupported.
We hope these results will give all our customers who study prokaryotic methylation the confidence to move forward with planning bacterial whole genome sequencing experiments on the Sequel II System, taking full advantage of the higher multiplexing capacity and reduced per sample cost.
Learn more bacterial whole genome sequencing and prokaryotic epigenetics on the Sequel II System.
March 18, 2020 | Plant + animal biology