Over the last 50 years, many monumental achievements in genetics and biology have unfolded before our eyes. From the invention of DNA sequencing technologies to the completion of the human genome project and the emergence of genomic medicine, we have made many intellectual and technological leaps in our understanding of life’s fundamental blueprint. However, as our knowledge deepens, we have also gained a greater appreciation for the fact that many more secrets remain hidden in the genome. So much more is still waiting to be uncovered. The field of genetics and genomics stands at the threshold of a transformative era, where unraveling these previously hidden mysteries hinges on our understanding of structural variation. And now with the right tools and technologies available at last, our capacity to study and comprehend structural variation is better than it has ever been.
What is structural variation?
Structural variants, abbreviated as SVs, are differences, or variations, in DNA sequences from one individual to another that range anywhere from as little as 50 to more than a million base pairs in size.
Since the inception of DNA sequencing in 1977, one could argue that the focus in genetics and genomics has been on the subtleties of single-base-pair differences or single nucleotide variants (SNVs). Often stated claims that any two human genomes are 99.9% identical or that human and chimpanzee genomes share 99% of the same genetic material are grounded in this “single nucleotide view” of genomics.1,2 Structural variation on the other hand casts a much broader net. These genomic variants encompass a diverse spectrum of genetic changes on a grand scale, including inversions, balanced translocations, and insertions and deletions. The latter two can lead to genomic imbalances known as copy number variants (CNVs). CNVs are regions of DNA where the number of copies of gene or segment of DNA can differ between individuals. When such CNVs are present in more than 1% of a population they are deemed to be copy number polymorphisms (CNPs).3
But the significance of structural variation extends far beyond what we currently know. In this evolving research landscape, the challenge lies in uncovering and fully understanding the extent of structural variation, as well as in routinely genotyping these diverse genetic transformations. Structural variation represents a new frontier in genetics, providing a gateway to explore its effects on our understanding of a wide range of topics, from human disease and complex traits to the evolution and diversification of life itself.
What are the different types of structural variation?
Structural variation is a multifaceted phenomenon, encompassing diverse types of changes in the genome. The following are some of the most recognized types of structural variation:
- Insertions: New genetic material is added into the genome, introducing novel sequences.
- Deletions: Portions of the genetic code are missing or deleted, leading to an altered DNA sequence.
- Duplications: Certain DNA segments are duplicated, resulting in multiple copies of similar genetic material.
- Inversions: Genetic regions are flipped or inverted, causing changes in the order of nucleotide bases.
- Translocations: DNA sequences are exchanged between different chromosomes, leading to new combinations of genetic material.
- Copy number variants (CNVs): Regions of the genome where the number of copies may vary among individuals, giving rise to genomic imbalances.
Together, these types of structural variation are thought to be the grand architects of genetic diversity, contributing to the complexity, richness, and diversity of life on earth. They can also be the root cause of various genetic diseases in humans, underscoring their importance in biomedical research as well.
Why is structural variation important?
The study of structural variation is an exciting domain in genetics and genomics with potentially far-reaching implications for human health, agriculture, and our fundamental understanding of biology. These substantial genomic alterations encompass deletions, duplications, insertions, inversions, and translocations, often orchestrating complex combinations of DNA gains, losses, and rearrangements. Perhaps unsurprisingly, there is growing evidence that much of the genetic variation housed within a population exists at a structural level in the genome.4, 5
The influence of structural variants on phenotype
One of the most intriguing aspects of SVs is their capacity to exert an observable impact on an organism’s phenotype. SVs have been shown to disrupt or alter gene function, modify gene regulatory mechanisms, and even recalibrate gene dosage.6 Multiple studies have cast a spotlight on the pivotal role of SVs in driving functional changes across populations and species.2,7 At the same time, SVs’ influence on molecular biology and medicine is becoming increasingly evident. For example, in neurological diseases, SVs have been implicated in Parkinson’s disease with expansion of ATTCC repeats.8 Similarly, the elongation of CAG sequences at the scale of SVs has been tied to Huntington’s disease9. Additionally, a retrotransposon insertion within the TAF1 gene has been linked to the preliminary stages of dystonia-parkinsonism disease.10 In the context of cancer, a variety of SVs have been identified as culprits, encompassing gene deletions or rearrangements, gene amplifications, gene fusions, and the reshuffling of gene regulatory elements.11, 12, 13
The influence of structural variation on plants and Mendelian disorders in humans
In Mendelian genetics, SVs are proving to have a major impact on various diseases associated with deletions or duplications within genetic regions. For instance, complicated SVs affecting genes such as ARID1B (associated with Coffin-Siris syndrome) and CDKL5 (associated with early infantile epileptic encephalopathy and Rett-like features) have been observed.14 The impact of these structural variants result in severe intellectual disabilities. Of course, it seems likely that most SVs are not strictly deleterious for humans. A pangenome analysis of the human Y chromosome published in the journal Nature revealed that, from individual to individual, the size of the Y can vary by nearly two-fold. More importantly, low levels of base substitution observed in the analysis suggest that most variation in the Y is structural in nature.15
Botanical research has also helped reveal the importance of SVs, where they have been shown to play a pivotal role in shaping phenotypes. SVs can confer tolerance to adverse conditions and positively impact crop yields and quality. For instance, SVs have been associated with aluminum resistance in maize, boron toxicity tolerance in barley, and glyphosate resistance in weeds.16, 17, 18 Furthermore, they have been linked to enhanced fruit yield and quality in plants like tomatoes.19
Despite their increasingly recognized importance, SVs have remained less explored, especially in comparison to SNVs. The main challenge in studying SVs stems from the difficulty in identifying them, given their large and often intricate patterns within the genome and their association with repetitive sequences in genomes. This complexity is further compounded by sequencing and mapping errors, along with the similar patterns that different types of SVs can produce. Furthermore, the challenge is intensified by the presence of overlapping or nested SVs, which further obscure the genomic landscape. However, the advent of high-throughput and highly accurate long-read sequencing technologies like PacBio HiFi sequencing, now makes SV research a whole lot easier.
HiFi sequencing offers an opportunity to search for answers hidden within structural variation
HiFi sequencing has revolutionized SV research, breathing new life into this field. Its capability to generate long (up to 20 kb) and incredibly accurate (Q30+) reads has made the detection and analysis of SVs significantly more accessible and efficient for researchers. This game-changing long-read technology can also be used for complementary approaches including Kinnex full-length transcriptomics that enable the identification of functional SVs through the detection of alternative splicing and RNA fusions.
We are at a critical point in our ability to detect and understand structural variation. Insight and imagination have always propelled human innovation forward. But technological advancements are what enable insight and imagination so that we can dive deeper into the unknown. As our capabilities evolve, we gain a clearer understanding of the role SVs play in biology, genetics, and disease. Ongoing efforts in benchmarking and developing quality control standards are improving our accuracy in identifying SVs. Armed with an expanding array of detection methods and a deepening understanding of their significance, we are well-positioned to uncover the many hidden aspects of structural variation and its substantial impact on the living world.
Are you ready to try HiFi?
Connect with a PacBio scientist
References
- The 1000 Genomes Project Consortium. A global reference for human genetic variation. Nature 526, 68–74 (2015). https://doi.org/10.1038/nature15393
- Zev N. Kronenberg et al. High-resolution comparative analysis of great ape genomes. Science 360, eaar6343(2018). DOI:1126/science. aar6343
- Eichler, E. E.(2008) Copy Number Variation and Human Disease. Nature Education 1(3):1
- Peter Ebert et al. Haplotype-resolved diverse human genomes and integrated analysis of structural variation. Science372, eabf7117(2021). DOI:1126/science.abf7117
- Liao, WW., Asri, M., Ebler, J.et al. A draft human pangenome reference. Nature 617, 312–324 (2023). https://doi.org/10.1038/s41586-023-05896-x
- Collins, R.L., Brand, H., Karczewski, K.J.et al. A structural variation reference for medical and population genetics. Nature 581, 444–451 (2020). https://doi.org/10.1038/s41586-020-2287-8
- Jeffares, D., Jolly, C., Hoti, M.et al. Transient structural variations have strong effects on quantitative traits and reproductive isolation in fission yeast. Nat Commun 8, 14061 (2017). https://doi.org/10.1038/ncomms14061
- Schüle, B., McFarland, K.N., Lee, K.et al. Parkinson’s disease associated with pure ATXN10 repeat expansion. npj Parkinson’s Disease 3, 27 (2017). https://doi.org/10.1038/s41531-017-0029-x
- McColgan, P. and Tabrizi, S.J. (2018), Huntington’s disease: a clinical review. Eur J Neurol, 25: 24-34.https://doi.org/10.1111/ene.13413
- Bragg DC, Mangkalaphiban K, Vaine CA, et al. Disease onset in X-linked dystonia-parkinsonism correlates with expansion of a hexameric repeat within an SVA retrotransposon inTAF1 [published correction appears in Proc Natl Acad Sci U S A. 2020 Mar 17;117(11):6277]. Proc Natl Acad Sci U S A. 2017;114(51):E11020-E11028. doi:10.1073/pnas.1712526114
- Stransky, N., Cerami, E., Schalm, S.et al. The landscape of kinase fusions in cancer. Nat Commun 5, 4846 (2014). https://doi.org/10.1038/ncomms5846
- Yi, K., Ju, Y.S. Patterns and mechanisms of structural variations in human cancer.Exp Mol Med 50, 1–11 (2018). https://doi.org/10.1038/s12276-018-0112-3
- Mahmoud, M., Gobet, N., Cruz-Dávalos, D.I.et al. Structural variant calling: the long and the short of it. Genome Biol 20, 246 (2019). https://doi.org/10.1186/s13059-019-1828-7
- Sanchis-Juan, A., Stephens, J., French, C.E.et al. Complex structural variants in Mendelian disorders: identification and breakpoint resolution using short- and long-read genome sequencing. Genome Med 10, 95 (2018). https://doi.org/10.1186/s13073-018-0606-6
- Hallast, P., Ebert, P., Loftus, M.et al. Assembly of 43 human Y chromosomes reveals extensive complexity and variation. Nature 621, 355–364 (2023). https://doi.org/10.1038/s41586-023-06425-6
- Maron LG, Guimarães CT, Kirst M, et al. Aluminum tolerance in maize is associated with higher MATE1 gene copy number. Proc Natl Acad Sci U S A. 2013;110(13):5241-5246. doi:10.1073/pnas.1220766110
- Tim Sutton et al. Boron-Toxicity Tolerance in Barley Arising from Efflux Transporter Amplification.Science 318,1446-1449(2007). DOI:1126/science.1146853
- Gaines TA, Zhang W, Wang D, et al. Gene amplification confers glyphosate resistance in Amaranthus palmeri. Proc Natl Acad Sci U S A. 2010;107(3):1029-1034. doi:10.1073/pnas.0906649107
- Soyk, S., Lemmon, Z.H., Sedlazeck, F.J.et al. Duplication of a domestication locus neutralized a cryptic variant that caused a breeding barrier in tomato. Plants 5, 471–479 (2019). https://doi.org/10.1038/s41477-019-0422-z