For Jim Lupski, his long-standing interest in the field of genomics is both personal and professional. His personal interest dates from his teenage years, when he was diagnosed with a rare genetic disease called Charcot-Marie-Tooth (CMT) neuropathy. As a clinician and scientist, he made it his mission to find the genetic basis of CMT, and in 1991 published his discovery of the CMT1A duplication, pioneering the field of structural variation and particularly copy number variation. Today he is a practicing pediatrician and a professor of molecular and human genetics at Baylor College of Medicine, where he is the Principal Investigator of the NHGRI Center for Mendelian Genomics. Highlights from his recent conversation with Mendelspod host Theral Timpson are below.
Moving Beyond the Gene-centric View
During the early days of Lupski’s career, genetics was based on a gene-centric/Mendelian view of biology. Though this enabled him and his team to identify the most common causative CMT gene — PMP22, related to 70% of the cases — the genetic cause of his own disease eluded him until genomics technology became more advanced, enabling researchers to look at more than one disease locus at a time. Lupski likens the advent of genomics to the impact of Einstein’s theories on Newtonian science: “When Einstein physics came around, it didn’t say that the Newtonian view was wrong, it just generalized the concept of relativity,” he told Timpson. “I think genomics more generalizes the concept of disease traits because we can look at more than one locus at a time.” A genome-wide view allows researchers to delve into the subtleties of genetic disease, including potential driver vs. modifier alleles, as well as the occurrence of de novo mutations which can’t be tracked through lineages.
Generating Data Faster than We Can Analyze It
“Gene discoveries [are] happening at a much greater pace … rather than reporting a gene a year, or a gene a month, or even a gene a week, we’re talking about hundreds of genes at a time,” said Lupski. He believes that data generated by clinical implementation may soon outstrip data from research laboratories, and that data is being generated faster than scientists can analyze. “We’ve got to dig into these data,” he added. He notes that when clinicians uncover a causative gene, they often leave the rest of the genome unexplored. Conversely, in the 75% of cases where no plausible explanation is found in a patient’s genome, patients should be given the option to share the data with researchers for basic research, as they do at the Center for Mendelian Genomics at Baylor.
The Benefit of Long Reads
Lupski’s genome has been published three times. The most recent assembly was done with PacBio sequencing, which identified significantly more structural variation than the other technologies, including three times more copy number variations compared to what was found with 10 different whole genome sequencing runs using short-read methods. Lupski imagines a clinic where he could use PacBio long reads, layering short-read sequencing data on top for precision. “De novo assembly in the clinic, to me, would be the goal that you would really strive for,” he said.
January 21, 2016 | General