Suppr超能文献

一个用于注释和预测单核苷酸多态性影响的程序,即SnpEff:黑腹果蝇品系w1118、iso-2、iso-3基因组中的单核苷酸多态性。

A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3.

作者信息

Cingolani Pablo, Platts Adrian, Wang Le Lily, Coon Melissa, Nguyen Tung, Wang Luan, Land Susan J, Lu Xiangyi, Ruden Douglas M

机构信息

Institute of Environmental Health Sciences, Wayne State University, Detroit, MI, USA.

出版信息

Fly (Austin). 2012 Apr-Jun;6(2):80-92. doi: 10.4161/fly.19695.

Abstract

We describe a new computer program, SnpEff, for rapidly categorizing the effects of variants in genome sequences. Once a genome is sequenced, SnpEff annotates variants based on their genomic locations and predicts coding effects. Annotated genomic locations include intronic, untranslated region, upstream, downstream, splice site, or intergenic regions. Coding effects such as synonymous or non-synonymous amino acid replacement, start codon gains or losses, stop codon gains or losses, or frame shifts can be predicted. Here the use of SnpEff is illustrated by annotating ~356,660 candidate SNPs in ~117 Mb unique sequences, representing a substitution rate of ~1/305 nucleotides, between the Drosophila melanogaster w(1118); iso-2; iso-3 strain and the reference y(1); cn(1) bw(1) sp(1) strain. We show that ~15,842 SNPs are synonymous and ~4,467 SNPs are non-synonymous (N/S ~0.28). The remaining SNPs are in other categories, such as stop codon gains (38 SNPs), stop codon losses (8 SNPs), and start codon gains (297 SNPs) in the 5'UTR. We found, as expected, that the SNP frequency is proportional to the recombination frequency (i.e., highest in the middle of chromosome arms). We also found that start-gain or stop-lost SNPs in Drosophila melanogaster often result in additions of N-terminal or C-terminal amino acids that are conserved in other Drosophila species. It appears that the 5' and 3' UTRs are reservoirs for genetic variations that changes the termini of proteins during evolution of the Drosophila genus. As genome sequencing is becoming inexpensive and routine, SnpEff enables rapid analyses of whole-genome sequencing data to be performed by an individual laboratory.

摘要

我们描述了一种新的计算机程序SnpEff,用于快速分类基因组序列中变异的影响。一旦对基因组进行测序,SnpEff会根据变异的基因组位置进行注释,并预测编码效应。注释的基因组位置包括内含子、非翻译区、上游、下游、剪接位点或基因间区域。可以预测编码效应,如同义或非同义氨基酸替换、起始密码子的获得或丢失、终止密码子的获得或丢失或移码。这里通过注释黑腹果蝇w(1118);iso-2;iso-3品系和参考品系y(1);cn(1)bw(1)sp(1)之间约117 Mb独特序列中的约356,660个候选单核苷酸多态性(SNP)来说明SnpEff的使用,其替换率约为1/305个核苷酸。我们表明约15,842个SNP是同义的,约4,467个SNP是非同义的(N/S约为0.28)。其余的SNP属于其他类别,如5'非翻译区中的终止密码子获得(38个SNP)、终止密码子丢失(8个SNP)和起始密码子获得(297个SNP)。正如预期的那样,我们发现SNP频率与重组频率成正比(即染色体臂中部最高)。我们还发现,黑腹果蝇中的起始密码子获得或终止密码子丢失的SNP通常会导致在其他果蝇物种中保守的N端或C端氨基酸的添加。看来5'和3'非翻译区是遗传变异的储存库,在果蝇属的进化过程中改变了蛋白质的末端。随着基因组测序变得廉价且常规化,SnpEff使单个实验室能够对全基因组测序数据进行快速分析。

相似文献

引用本文的文献

本文引用的文献

2
The variant call format and VCFtools.变异调用格式和 VCFtools。
Bioinformatics. 2011 Aug 1;27(15):2156-8. doi: 10.1093/bioinformatics/btr330. Epub 2011 Jun 7.
3
Improving SNP discovery by base alignment quality.通过碱基比对质量提高 SNP 发现。
Bioinformatics. 2011 Apr 15;27(8):1157-8. doi: 10.1093/bioinformatics/btr076. Epub 2011 Feb 13.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验