Menu
July 7, 2019

SV2: Accurate structural variation genotyping and de novo mutation detection from whole genomes.

Structural Variation (SV) detection from short-read whole genome sequencing is error prone, presenting significant challenges for population or family-based studies of disease.Here we describe SV2, a machine-learning algorithm for genotyping deletions and duplications from paired-end sequencing data. SV2 can rapidly integrate variant calls from multiple structural variant discovery algorithms into a unified call set with high genotyping accuracy and capability to detect de novo mutations. SV2 is freely available on GitHub (https://github.com/dantaki/SV2).Supplementary data are available at Bioinformatics online.© The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com


July 7, 2019

Copy number variation and expression analysis reveals a nonorthologous pinta gene family member involved in butterfly vision.

Vertebrate (cellular retinaldehyde-binding protein) and Drosophila (prolonged depolarization afterpotential is not apparent [PINTA]) proteins with a CRAL-TRIO domain transport retinal-based chromophores that bind to opsin proteins and are necessary for phototransduction. The CRAL-TRIO domain gene family is composed of genes that encode proteins with a common N-terminal structural domain. Although there is an expansion of this gene family in Lepidoptera, there is no lepidopteran ortholog of pinta. Further, the function of these genes in lepidopterans has not yet been established. Here, we explored the molecular evolution and expression of CRAL-TRIO domain genes in the butterfly Heliconius melpomene in order to identify a member of this gene family as a candidate chromophore transporter. We generated and searched a four tissue transcriptome and searched a reference genome for CRAL-TRIO domain genes. We expanded an insect CRAL-TRIO domain gene phylogeny to include H. melpomene and used 18 genomes from 4 subspecies to assess copy number variation. A transcriptome-wide differential expression analysis comparing four tissue types identified a CRAL-TRIO domain gene, Hme CTD31, upregulated in heads suggesting a potential role in vision for this CRAL-TRIO domain gene. RT-PCR and immunohistochemistry confirmed that Hme CTD31 and its protein product are expressed in the retina, specifically in primary and secondary pigment cells and in tracheal cells. Sequencing of eye protein extracts that fluoresce in the ultraviolet identified Hme CTD31 as a possible chromophore binding protein. Although we found several recent duplications and numerous copy number variants in CRAL-TRIO domain genes, we identified a single copy pinta paralog that likely binds the chromophore in butterflies.© The Author(s) 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.


July 7, 2019

Diversity in grain amaranths and relatives distinguished by genotyping by sequencing (GBS).

The genotyping by sequencing (GBS) method has become a molecular marker technology of choice for many crop plants because of its simultaneous discovery and evaluation of a large number of single nucleotide polymorphisms (SNPs) and utility for germplasm characterization. Genome representation and complexity reduction are the basis for GBS fingerprinting and can vary by species based on genome size and other sequence characteristics. Grain amaranths are a set of three species that were domesticated in the New World to be high protein, pseudo-cereal grain crops. The goal of this research was to employ the GBS technique for diversity evaluation in grain amaranth accessions and close relatives from sixAmaranthusspecies and determine genetic differences and similarities between groupings. A total of 10,668 SNPs were discovered in 94 amaranth accessions withApeKI complexity reduction and 10X genome coverage Illumina sequencing. The majority of the SNPs were species specific with 4,568 and 3,082 for the two grain amaranths originating in Central AmericaAmaranthus cruentus and A. hypochondriacusand 3,284 found amongst bothA. caudatus, originally domesticated in South America, and its close relative,A. quitensis. The distance matrix based on shared alleles provided information on the close relationships of the two cultivated Central American species with each other and of the wild and cultivated South American species with each other, as distinguished from the outgroup with two wild species,A. powelliiandA. retroflexus. The GBS data also distinguished admixture between each pair of species and the geographical origins and seed colors of the accessions. The SNPs we discovered here can be used for marker development for future amaranth study.


July 7, 2019

Probing genomic aspects of the multi-host pathogen Clostridium perfringens reveals significant pangenome diversity, and a diverse array of virulence factors.

Clostridium perfringens is an important cause of animal and human infections, however information about the genetic makeup of this pathogenic bacterium is currently limited. In this study, we sought to understand and characterise the genomic variation, pangenomic diversity, and key virulence traits of 56 C. perfringens strains which included 51 public, and 5 newly sequenced and annotated genomes using Whole Genome Sequencing. Our investigation revealed that C. perfringens has an “open” pangenome comprising 11667 genes and 12.6% of core genes, identified as the most divergent single-species Gram-positive bacterial pangenome currently reported. Our computational analyses also defined C. perfringens phylogeny (16S rRNA gene) in relation to some 25 Clostridium species, with C. baratii and C. sardiniense determined to be the closest relatives. Profiling virulence-associated factors confirmed presence of well-characterised C. perfringens-associated exotoxins genes including a-toxin (plc), enterotoxin (cpe), and Perfringolysin O (pfo or pfoA), although interestingly there did not appear to be a close correlation with encoded toxin type and disease phenotype. Furthermore, genomic analysis indicated significant horizontal gene transfer events as defined by presence of prophage genomes, and notably absence of CRISPR defence systems in >70% (40/56) of the strains. In relation to antimicrobial resistance mechanisms, tetracycline resistance genes (tet) and anti-defensins genes (mprF) were consistently detected in silico (tet: 75%; mprF: 100%). However, pre-antibiotic era strain genomes did not encode for tet, thus implying antimicrobial selective pressures in C. perfringens evolutionary history over the past 80 years. This study provides new genomic understanding of this genetically divergent multi-host bacterium, and further expands our knowledge on this medically and veterinary important pathogen.


July 7, 2019

Safety evaluation of HOWARU®Restore (Lactobacillus acidophilus NCFM, Lactobacillus paracasei Lpc-37, Bifidobacterium animalis subsp. lactis Bl-04 and B. lactis Bi-07) for antibiotic resistance, genomic risk factors, and acute toxicity.

Although probiotic lactobacilli and bifidobacteria are generally considered safe by various regulatory agencies, safety properties, such as absence of transferable antibiotic resistance, must still be determined for each strain prior to market introduction as a probiotic. Safety requirements for probiotics vary regionally and evaluation methods are not standardized, therefore methodologies are often adopted from food ingredients or chemicals to assess microbial safety. Four individual probiotic strains, Lactobacillus acidophilus NCFM®, Lactobacillus paracasei Lpc-37®, Bifidobacterium animalis subsp. lactis strains Bl-04®, and Bi-07®, and their combination (HOWARU®Restore) were examined for antibiotic resistance by broth microdilution culture, toxin genes by PCR and genome mining, and acute oral toxicity in rats. Only B. lactis Bl-04 exhibited antibiotic resistance above a regulated threshold due to a tetW gene previously demonstrated to be non-transferable. Genomic mining did not reveal any bacterial toxin genes known to harm mammalian hosts in any of the strains. The rodent studies did not indicate any evidence of acute toxicity following a dose of 1.7-4.1 × 1012 CFU/kg body weight. Considering a 100-fold safety margin, this corresponds to 1.2-2.8 × 1012 CFU for a 70 kg human. Our findings demonstrate a comprehensive approach of in vitro, in silico, and in vivo safety testing for probiotics. Copyright © 2017 The Author(s). Published by Elsevier Ltd.. All rights reserved.


July 7, 2019

Study of mesophilic Aeromonas salmonicida A527 strain sheds light on the species’ lifestyles and taxonomic dilemma.

The Gram-negative bacterium Aeromonas salmonicida contains five subspecies: salmonicida, smithia, achromogenes, masoucida and pectinolytica. Pectinolytica is a mesophilic subspecies with the ability to thrive at a wide range of temperatures, including 37°C, while the four other subspecies are psychrophilic, restricted to lower temperatures. The psychrophilic subspecies are known to infect a wide range of fishes. However, there is no evidence of pathogenicity for the mesophilic subspecies pectinolytica. Study of the differences between the mesophilic and psychrophilic subspecies is hampered by the lack of completely sequenced and closed genomes from the mesophilic subspecies. A previous study reported that insertion sequences, which can induce genomic rearrangements at temperatures around 25°C, could be one of the determinants explaining the differences in lifestyle (mesophilic or psychrophilic) between the subspecies. In this study, the genome of mesophilic strain A527 of A. salmonicida was sequenced, closed and analyzed to investigate the mesophilic-psychrophilic discrepancy. This reference genome supports the hypothesis that insertion sequences are major determinants of the lifestyle differences between the A. salmonicida subspecies. Moreover, the phylogenetic analysis performed to position strain A527 within the taxonomy raises an issue regarding the intraspecies structure of A. salmonicida.© FEMS 2017. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.


July 7, 2019

Genome-wide epigenetic studies in chicken: A review

Over the years, farmed birds have been selected on various performance traits mainly through genetic selection. However, many studies have shown that genetics may not be the sole contributor to phenotypic plasticity. Gene expression programs can be influenced by environmentally induced epigenetic changes that may alter the phenotypes of the developing animals. Recently, high-throughput sequencing techniques became sufficiently affordable thanks to technological advances to study whole epigenetic landscapes in model plants and animals. In birds, a growing number of studies recently took advantage of these techniques to gain insights into the epigenetic mechanisms of gene regulation in processes such as immunity or environmental adaptation. Here, we review the current gain of knowledge on the chicken epigenome made possible by recent advances in high-throughput sequencing techniques by focusing on the two most studied epigenetic modifications, DNA methylation and histone post-translational modifications. We discuss and provide insights about designing and performing analyses to further explore avian epigenomes. A better understanding of the molecular mechanisms underlying the epigenetic regulation of gene expression in relation to bird phenotypes may provide new knowledge and markers that should undoubtedly contribute to a sustainable poultry production.


July 7, 2019

Evolutionary context of non-sorbitol-fermenting Shiga toxin-producing Escherichia coli O55:H7.

In July 2014, an outbreak of Shiga toxin-producing Escherichia coli (STEC) O55:H7 in England involved 31 patients, 13 (42%) of whom had hemolytic uremic syndrome. Isolates were sequenced, and the sequences were compared with publicly available sequences of E. coli O55:H7 and O157:H7. A core-genome phylogeny of the evolutionary history of the STEC O55:H7 outbreak strain revealed that the most parsimonious model was a progenitor enteropathogenic O55:H7 sorbitol-fermenting strain, lysogenized by a Shiga toxin (Stx) 2a-encoding phage, followed by loss of the ability to ferment sorbitol because of a non-sense mutation in srlA. The parallel, convergent evolutionary histories of STEC O157:H7 and STEC O55:H7 may indicate a common driver in the evolutionary process. Because emergence of STEC O157:H7 as a clinically significant pathogen was associated with acquisition of the Stx2a-encoding phage, the emergence of STEC O55:H7 harboring the stx2a gene is of public health concern.


July 7, 2019

An update on bioinformatics resources for plant genomics research

Next-generation sequencing and traditional Sanger sequencing methods are of great significance in unraveling the complexity of plant genomes. These are constantly generating heaps of sequence data to be analyzed, annotated and stored. This has created a revolutionary demand for bioinformatics tools and software that can perform these functions. A large number of potentially useful bioinformatics tools and plant genome databases are created that have greatly simplified the analysis and storage of vast amounts of sequence data. The information garnered using the available bioinformatics methods have greatly helped in understanding the plant genome structure. Despite the availability of a good number of such tools, the information pouring from single gene-sequencing, and various whole-genome sequencing projects is overwhelming; thus, further innovations and improved methods are needed to sift through this sequence data, and assemble genomes. The current review focuses on diverse bioinformatics approaches and methods developed to systematically analyze and store plant sequence data. Finally, it outlines the bottlenecks in plant genome analysis, and some possible solutions that could be utilized to overcome the problems associated with plant genome analysis.


July 7, 2019

Genome sequencing brought Gossypium biology research into a new era.

The first sequenced diploid cotton genome was published in 2012 by the group led by the Institute of Cotton Research, Chinese Academy of Agricultural Sciences. Cotton genomics research subsequently entered a period of rapid development. The accumulating data have provided new insights into the evolution and domestication of cotton, the development of important agronomic traits, and strategies for improving cotton quality and production.


July 7, 2019

Bi-level error correction for PacBio long reads.

The latest sequencing technologies such as the Pacific Biosciences (PacBio) and Oxford Nanopore machines can generate long reads at the length of thousands of nucleic bases which is much longer than the reads at the length of hundreds generated by Illumina machines. However, these long reads are prone to much higher error rates, for example 15%, making downstream analysis and applications very difficult. Error correction is a process to improve the quality of sequencing data. Hybrid correction strategies have been recently proposed to combine Illumina reads of low error rates to fix sequencing errors in the noisy long reads with good performance. In this paper, we propose a new method named Bicolor, a bi-level framework of hybrid error correction for further improving the quality of PacBio long reads. At the first level, our method uses a de Bruijn graph-based error correction idea to search paths in pairs of solid -mers iteratively with an increasing length of -mer. At the second level, we combine the processed results under different parameters from the first level. In particular, a multiple sequence alignment algorithm is used to align those similar long reads, followed by a voting algorithm which determines the final base at each position of the reads. We compare the superior performance of Bicolor with three state-of-the-art methods on three real data sets. Results demonstrate that Bicolor always achieves the highest identity ratio. Bicolor also achieves a higher alignment ratio () and a higher number of aligned reads than the current methods on two data sets. On the third data set, our method is closely competitive to the current methods in terms of number of aligned reads and genome coverage. The C++ source codes of our algorithm are freely available at https://github.com/yuansliu/Bicolor.


July 7, 2019

Ultraaccurate genome sequencing and haplotyping of single human cells.

Accurate detection of variants and long-range haplotypes in genomes of single human cells remains very challenging. Common approaches require extensive in vitro amplification of genomes of individual cells using DNA polymerases and high-throughput short-read DNA sequencing. These approaches have two notable drawbacks. First, polymerase replication errors could generate tens of thousands of false-positive calls per genome. Second, relatively short sequence reads contain little to no haplotype information. Here we report a method, which is dubbed SISSOR (single-stranded sequencing using microfluidic reactors), for accurate single-cell genome sequencing and haplotyping. A microfluidic processor is used to separate the Watson and Crick strands of the double-stranded chromosomal DNA in a single cell and to randomly partition megabase-size DNA strands into multiple nanoliter compartments for amplification and construction of barcoded libraries for sequencing. The separation and partitioning of large single-stranded DNA fragments of the homologous chromosome pairs allows for the independent sequencing of each of the complementary and homologous strands. This enables the assembly of long haplotypes and reduction of sequence errors by using the redundant sequence information and haplotype-based error removal. We demonstrated the ability to sequence single-cell genomes with error rates as low as 10-8and average 500-kb-long DNA fragments that can be assembled into haplotype contigs with N50 greater than 7 Mb. The performance could be further improved with more uniform amplification and more accurate sequence alignment. The ability to obtain accurate genome sequences and haplotype information from single cells will enable applications of genome sequencing for diverse clinical needs. Copyright © 2017 the Author(s). Published by PNAS.


July 7, 2019

A blaOXA-181-harbouring multi-resistant ST147 Klebsiella pneumoniae isolate from Pakistan that represent an intermediate stage towards pan-drug resistance.

Carbapenem resistant Klebsiella pneumoniae (CR-KP) infections are an ever-increasing global issue, especially in the Indian subcontinent. Here we report genetic insight into a blaOXA-181 harbouring Klebsiella pneumoniae, belonging to the pandemic lineage ST147, that represents an intermediate stage towards pan-drug resistance. The CR-KP isolate DA48896 was isolated from a patient from Pakistan and was susceptible only to tigecycline and colistin. It harboured blaOXA-181 and was assigned to sequence type ST147. Analysis from whole genome sequencing revealed a very high sequence similarity to the previously sequenced pan-resistant K. pneumoniae isolate MS6671 from the United Arab Emirates. The two isolates are very closely related with only 46 chromosomal nucleotide differences, 14 indels and differences in plasmid content. Both carry a substantial number of plasmid-borne and chromosomally encoded resistance determinants. Interestingly, the two differences in susceptibility between the isolates could be attributed to DA48896 lacking an insertion of blaOXA-181 into the mgrB gene that results in colistin resistance in MS6671 and SNPs affecting AcrAB efflux pump expression likely to result in tigecycline resistance. These differences between the otherwise very similar isolates indicate that strong selection has occurred for resistance towards these last-resort drugs and illustrates the trajectory of resistance evolution of OXA-181-producing versions of the ST147 international risk clone.


July 7, 2019

Characterization of Fusobacterium varium Fv113-g1 isolated from a patient with ulcerative colitis based on complete genome sequence and transcriptome analysis.

Fusobacterium spp. present in the oral and gut flora is carcinogenic and is associated with the risk of pancreatic and colorectal cancers. Fusobacterium spp. is also implicated in a broad spectrum of human pathologies, including Crohn’s disease and ulcerative colitis (UC). Here we report the complete genome sequence of Fusobacterium varium Fv113-g1 (genome size, 3.96 Mb) isolated from a patient with UC. Comparative genome analyses totally suggested that Fv113-g1 is basically assigned as F. varium, in particular, it could be reclassified as notable F. varium subsp. similar to F. ulcerans because of partial shared orthologs. Compared with the genome sequences of F. varium ATCC 27725 (genome size, 3.30 Mb) and other strains of Fusobacterium spp., Fv113-g1 possesses many accessary pan-genome sequences with noteworthy multiple virulence factors, including 44 autotransporters (type V secretion system, T5SS) and 13 Fusobacterium adhesion (FadA) paralogs involved in potential mucosal inflammation. Indeed, transcriptome analysis demonstrated that Fv113-g1-specific accessary genes, such as multiple T5SS and fadA paralogs, showed notably increased expression with D-MEM cultivation than with brain heart infusion broth. This implied that growth condition may enhance the expression of such potential virulence factors, leading to remarkable survival against other gut microorganisms and to the pathogenicity to human intestinal epithelium.


July 7, 2019

Complete genome sequence and comparative genomics of the golden pompano (Trachinotus ovatus) pathogen, Vibrio harveyistrain QT520.

Vibrio harveyi is a Gram-negative, halophilic bacterium that is an opportunistic pathogen of commercially farmed marine vertebrate species. To understand the pathogenicity of this species, the genome of V. harveyi QT520 was analyzed and compared to that of other strains. The results showed the genome of QT520 has two unique circular chromosomes and three endogenous plasmids, totaling 6,070,846 bp with a 45% GC content, 5,701 predicted ORFs, 134 tRNAs and 37 rRNAs. Common virulence factors, including ACF, IlpA, OmpU, Flagellin, Cya, Hemolysin and MARTX, were detected in the genome, which are likely responsible for the virulence of QT520. The results of genomes comparisons with strains ATCC 33843 (392 (MAV)) and ATCC 43516 showed that greater numbers genes associated with types I, II, III, IV and VI secretion systems were detected in QT520 than in other strains, suggesting that QT520 is a highly virulent strain. In addition, three plasmids were only observed in the complete genome sequence of strain QT520. In plasmid p1 of QT520, specific virulence factors (cyaB, hlyB and rtxA) were identified, suggesting that the pathogenicity of this strain is plasmid-associated. Phylogenetic analysis of 12 complete Vibrio sp. genomes using ANI values, core genes and MLST revealed that QT520 was most closely related to ATCC 33843 (392 (MAV)) and ATCC 43516, suggesting that QT520 belongs to the species V. harveyi. This report is the first to describe the complete genome sequence of a V. harveyi strain isolated from an outbreak in a fish species in China. In addition, to the best of our knowledge, this report is the first to compare the V. harveyi genomes of several strains. The results of this study will expand our understanding of the genome, genetic characteristics, and virulence factors of V. harveyi, setting the stage for studies of pathogenesis, diagnostics, and disease prevention.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.