A new publication from scientists in The Netherlands and Belgium offers tantalizing insights that may shed light on age-related neurodegenerative disorders. The team used SMRT Sequencing to produce a de novo diploid assembly of the genome of a Dutch woman named Hendrikje van Andel-Schipper, who died at the age of 115 with no signs of cognitive decline, and then performed a detailed analysis of variants detected. The data are publicly available to the scientific community.
The paper, released in Translational Psychiatry, comes from lead author Jasper Linthorst and senior author Henne Holstege (@HolstegeHenne) at Amsterdam Neuroscience and their collaborators. They aimed to identify structural variants (SVs) that could be associated with the onset of neurological disorders; for this, they performed a comparison between several previously available human genome assemblies which included the centenarian genome assembly.
The team chose long-read PacBio sequencing technology because they determined that “due to their repetitive nature, [SVs] are currently underexplored in short-read whole genome sequencing approaches,” they write. Repetitive regions, particularly repeat expansions that tend to grow larger over generations, have been shown to be pathogenic for a number of neurological diseases. “Using common sequencing approaches, the assessment of large repetitive regions is difficult because short 100-150 bp sequence-reads do not span the entire structural variant,” the authors report. “The solution to this problem is to generate longer sequencing reads.”
For this project, the scientists generated a de novo, phased genome assembly for the 115-year-old woman, which they refer to as W115. This was based on sequencing genomic DNA from three tissues and relied on FALCON-Unzip to create the diploid assembly of about 2.82 Gb. This information was compared to two haploid assemblies and the latest human reference genome to search for SVs of 50 bp or longer.
The scientists used a graph-based multi-genome aligner called REVEAL and found a total of 31,680 SVs. Nearly 70% were classified as variable number tandem repeats (VNTRs). “Interestingly, we observed that VNTRs in the subtelomeric regions were composed of longer repeat subunits than VNTRs outside the subtelomeric regions, and that they had a higher GC-content,” they report. Expanded VNTRs have been linked to faulty gene transcription. “The genes that contained most VNTRs, of which PTPRN2 and DLGAP2 are the most prominent examples, were found to be predominantly expressed in the brain and associated with a wide variety of neurological disorders,” the scientists add.
In addition, the team analyzed the list of structural variants to see how SMRT Sequencing had made a difference in detection. Using short-read data for the W115 genome only, they found just 5,826 SVs. About 83% of the SVs — that’s more than 18,000 variants — found in the PacBio assembly “were uniquely identified through long-read sequencing,” the scientists note.
The sequence data for this project was produced on a PacBio RS II system, but Holstege and her team have already acquired a Sequel II System for the next phase of this effort. That will involve a large study encompassing at least 150 cognitively healthy centenarians and 150 individuals with Alzheimer’s disease, with the goal of identifying VNTRs that have significantly different lengths between the two groups. Holstege and her team will be generating HiFi reads and they expect to cover each genome in the study with a single SMRT Cell. “We want to know what about these individuals makes them so special,” she told us.
Interested in learning more about this research? Watch Holstege’s presentation “Uncovering Neurological Disorders Through an Examination of VNTRs” now available on demand. Explore workflows and additional resources on comprehensive variant detection or structural variant detection.
November 10, 2020 | Human genetics research