A new publication in the Journal of Human Genetics describes an impressive effort to identify the pathogenic variant causing progressive myoclonic epilepsy in two siblings. The scientific team used SMRT Sequencing to discover a 12.4 kb structural variant in a repetitive, GC-rich region after several other methods — including whole exome sequencing — failed to find the answer.
The paper comes from lead author Takeshi Mizuguchi, senior author Naomichi Matsumoto, and collaborators at Yokohama City University, Aichi Prefectural Colony Central Hospital, and other institutions in Japan. As the authors note, whole exome sequencing has delivered strong results for many cases that would otherwise have gone undiagnosed; for progressive myoclonic epilepsy in particular, the diagnostic yield is 31%. “However, the remaining 69% of cases present a genetic challenge,” the scientists report. “These findings suggest that certain types of pathogenic variation evade detection by the currently available genetic analysis.”
In this project, researchers were stumped by two siblings — a 20-year-old female and a 13-year-old male — who both showed signs of a severe neurodegenerative condition. While a genetic cause was highly suspected, trio-based whole exome sequencing and a subsequent search for causative single nucleotide variants turned up no leads. The scientists then deployed SMRT Sequencing, focusing on structural variants ranging in size from 50 bp to 50 kb, especially in regions that are challenging for short-read platforms to sequence. They used the Sequel System to generate low-coverage whole genome sequencing of an affected sibling and three unrelated controls.
Analysis of the 6-fold coverage of the case sample with PacBio’s pbsv software identified more than 17,000 structural variants — including more than 7,200 deletions and nearly 10,000 insertions. The scientists filtered out structural variants seen in the control samples to quickly narrow the list of potentially causal candidates, and whittled the list further by selecting candidates that impact a coding gene. Fifty variants remained, five of which affected genes associated with an autosomal recessive phenotype. “Surprisingly, a 12.4-kb deletion call spanning the first coding exon of CLN6 was found,” the team writes. Biallelic mutations in CLN6 cause neuronal ceroid lipofuscinosis, a disease with clinical features that match those of the two siblings. Additional Southern blot and RT-PCR analysis validated the deletion and demonstrated that it was pathogenic.
With this finding in hand, the team went back to try to understand why the deletion had proven so elusive earlier. Two exome analysis methods “completely missed the homozygous CLN6 deletion … probably due to the scanty read coverage against CLN6 exon 1 with high GC content (77.6%) even in controls,” the scientists report. “By contrast, PacBio long reads showed uniform coverage … which improved the variant detection in GC-rich regions containing multiple repetitive elements.” Even with only three SMRT Sequencing reads of the CLN6 region, “the long sequences of the reads conferred excellent mappability and ensured the robust detection of [structural variants],” the team adds.
The authors encourage other scientists to consider using long-read sequencing for similar cases where exome analysis reveals no pathogenic variants. They also call for the development of a robust structural variation database, along the lines of what gnomAD does for small variants. “For the purpose of reducing the number of candidate of diseases-causing mutations, it would be extremely beneficial if a public database for [structural variants] were available,” they note.
February 21, 2019 | Human genetics research