Traditional RNA-Seq is done by fragmenting cDNA, and then sequencing the fragmented reads with paired-end sequencing. The problem comes when trying to identify the full-length isoform during assembly. This is computationally challenging, and sometimes intractable.
The solution? Long-read isoform sequencing, according to PacBio Principal Scientist Elizabeth Tseng and PacBio user Gloria Sheynkman, a research fellow at Dana-Farber Cancer Institute. The two recently participated in a webinar, sharing their experiences using PacBio’s Iso-Seq method.
Tseng started by explaining the method and some of its applications.
“In contrast to traditional RNA-Seq, the Iso-Seq method produces full-length cDNA, and using the PacBio long, accurate reads, can sequence the full transcript. No assembly is required,” Tseng said.
She also discussed some of the bioinformatics tools available, including SQANTI2, which can be used to classify full-length transcripts against annotations such as GENCODE, and as a quality control tool.
The Iso-Seq method in action
At Dana Farber, Sheynkman uses the Iso-Seq method to characterize cancer cells and create complete, accurate transcriptomes.
In one example, she ran five breast cancer cell lines and eight melanoma samples in one pooled library on a single SMRT Cell 8M each, achieving around 6 million polymerase reads, with an average base yield of 300 Gb and an average polymerase read length of ~50 kb. Sequencing was performed by Maryland Genomics, a PacBio certified service provider.
With the improvements in the chemistry of the new Sequel II System, Sheynkman said she has been able to capture a much wider range of transcript lengths, without having to do size selection.
“Overall, we’re really detecting a much larger range, with cDNA molecules up to 6 and 7 kb,” she said.
For both breast cancer and melanoma, she obtained about 14,000 unique genes and 11,000 unique isoforms, around 30% of which were novel. And each SMRT Link Iso-Seq job was completed in just 6-9 hours, she noted.
Towards accurate isoform quantification
But the most promising application for Sheynkman is isoform quantification.
“I think this is a really important goal for the field,” she said. “To know how many copies of each transcript is expressed in the cell will really open up a lot of avenues in biomedical research, such as having consistent biomarkers, understanding disease mechanisms, and even just fundamental biological understanding.”
While short-read sequencing can achieve gene quantification with reliable results, the complexity of isoform structure requires more comprehensive coverage.
“Isoform quantification methods are very dependent on having accurate transcript models,” Sheynkman said.
The improved depth of coverage and reduced bias in sampling full-length reads on the new Sequel II System should mitigate these limitations, Sheynkman said. And the high technical reproducibility achieved by PacBio is another big strength, she added.
Targeted or whole transcriptome sequencing?
Sheynkman further discussed cases in which targeted isoform sequencing may be preferred over whole transcriptome sequencing. By targeting only genes of interest, rare isoforms could be identified with low to moderate sequencing effort. Multiplexing could further reduce the cost and increase sample size, which could be useful for applications such as biomarker discovery and validation.
Sheynkman also provided a new solution to a common problem when doing targeted sequencing: probes. ORF Capture-Seq is designed as an easy, versatile option to make capture probes directly from available clones/PCR product within a single day, using low-cost molecular biology reagents. It also allows for the generation of many complex probe sets tailored to different genes, Sheynkman said. The complexity of the probe sets allows you to target anywhere from 1 to 1,000 genes.
The webinar also included a lively Q&A that is well worth the listen.