An article published today in Genetics in Medicine from Jason Merker, Euan Ashley, and colleagues at Stanford University reports the first successful application of PacBio whole genome sequencing to identify a disease-causing mutation. (Check out Stanford’s news release here.) The authors describe an individual who presented over 20 years with a series of benign tumors in his heart and glands. The individual satisfied the clinical criteria for Carney complex, but after eight years of genetic evaluation, including whole genome short-read sequencing, experts were still unable to pinpoint the underlying genetic mutation and confirm a diagnosis.
Ultimately, the authors turned to the Sequel System to evaluate structural variants, large genetic differences that involve at least 50 base pairs and are uniquely discoverable with long-read sequencing. This quickly led to the identification of the causative mutation: a 2.2 kb deletion that affects PRKAR1A, the gene involved in Carney complex. This case demonstrates the ability of long-read sequencing on the Sequel System to reveal genetic variation that is inaccessible with short-read technologies and highlights the potential to apply PacBio sequencing to precision medicine [1].
A human genome has around 20,000 structural variants (differences ≥50 bp) spanning 10 Mb, more base pairs than single nucleotide variants and small indels put together. Because structural variants tend to lie in repetitive regions of the genome and/or are larger than short-read sequencers can span, the vast majority (80%) are identified only by long-read sequencing. This means even so-called “whole” genome sequencing with short reads misses much of the variation in a human genome. [2]
Figure 1. Structural variation in the human genome. (a) Types of structural variation. (b) Differences between two typical human genomes. (c) Structural variants detected in a typical human genome with PacBio sequencing compared to short-read sequencing.
Carney complex, a multiple neoplasia syndrome, is exceedingly rare, with fewer than 750 cases ever reported. Most individuals with the syndrome have a mutation that inactivates one of the two copies of the gene PRKAR1A. However, in the case reported today, clinical sequencing of PRKAR1A did not reveal any mutations. Then, short-read whole genome sequencing was applied to look for mutations throughout the genome, but it was uninformative. Ashley, Merker and colleagues were then driven to apply PacBio long-read sequencing to evaluate structural variants missed by previous methods.
The Sequel System was used to generate approximately eight-fold coverage of the human genome. Reads were mapped with NGM-LR [3], and structural variants were called with PBHoney [4], yielding 6,971 deletions and 6,821 insertions. These were filtered for rare, genic variants associated with disease genes, which left only six candidates for manual evaluation. One of the six variants was a heterozygous 2.2 kb deletion that removes the first coding exon of PRKAR1A. The variant was evaluated with Sanger sequencing in the individual and his parents, which demonstrated that the deletion is a de novo mutation not present in the parents.
Approximately two-thirds of individuals with presumed genetic disorders remain undiagnosed even after short-read exome and whole genome sequencing. It is hypothesized many of the undiagnosed cases are explained by variants missed by short-read sequencing technologies, most notably structural variants, variants in GC-rich regions of the genome, and repeat expansions [5]. The study published today provides a proof-of-principle demonstrating that PacBio long-read sequencing identifies previously overlooked structural variants, even at relatively low sequencing coverage. We are excited by upcoming studies that will evaluate many more cases to elucidate the improvement in diagnostic yield from long-read sequencing, and to demonstrate that precision medicine requires a comprehensive view of genetic variation.
[1] Merker JD, et al. (2017). Genetics in Medicine.
[2] Huddleston J, et al. (2017). Genome Research, 27(5):677-685.
[3] https://github.com/philres/nextgenmap-lr
[4] English AC, et al. (2015). BMC Genomics, 16:286.
[5] Biesecker LG, et al. (2011). Genome Biology, 12(9):128.