Suppr超能文献

利用比较基因组学提高外显子预测的特异性。

Improving the specificity of exon prediction using comparative genomics.

作者信息

Wu Jing

机构信息

Department of Statistics, Purdue University, 150 N, University Street, West Lafayette, IN 47906, USA.

出版信息

BMC Genomics. 2008 Sep 16;9 Suppl 2(Suppl 2):S13. doi: 10.1186/1471-2164-9-S2-S13.

Abstract

BACKGROUND

Computational gene prediction tools routinely generate large volumes of predicted coding exons (putative exons). One common limitation of these tools is the relatively low specificity due to the large amount of non-coding regions.

METHODS

A statistical approach is developed that largely improves the gene prediction specificity. The key idea is to utilize the evolutionary conservation principle relative to the coding exons. By first exploiting the homology between genomes of two related species, a probability model for the evolutionary conservation pattern of codons across different genomes is developed. A probability model for the dependency between adjacent codons/triplets is added to differentiate coding exons and random sequences. Finally, the log odds ratio is developed to classify putative exons into the group of coding exons and the group of non-coding regions.

RESULTS

The method was tested on pre-aligned human-mouse sequences where the putative exons are predicted by GENSCAN and TWINSCAN. The proposed method is able to improve the exon specificity by 73% and 32% respectively, while the loss of the sensitivity < or = 1%. The method also keeps 98% of RefSeq gene structures that are correctly predicted by TWINSCAN when removing 26% of predicted genes that are in non-coding regions. The estimated number of true exons in TWINSCAN's predictions is 157,070. The results and the executable codes can be downloaded from http://www.stat.purdue.edu/~jingwu/codon/

CONCLUSION

The proposed method demonstrates an application of the evolutionary conservation principle to coding exons. It is a complementary method which can be used as an additional criteria to refine many existing gene predictions.

摘要

背景

计算基因预测工具通常会生成大量预测的编码外显子(推定外显子)。这些工具的一个常见局限性是由于非编码区域数量众多,导致特异性相对较低。

方法

开发了一种统计方法,该方法在很大程度上提高了基因预测的特异性。关键思想是利用相对于编码外显子的进化保守原则。首先通过利用两个相关物种基因组之间的同源性,开发了一个跨不同基因组密码子进化保守模式的概率模型。添加了相邻密码子/三联体之间依赖性的概率模型,以区分编码外显子和随机序列。最后,开发对数优势比,将推定外显子分类为编码外显子组和非编码区域组。

结果

该方法在预先比对的人类 - 小鼠序列上进行了测试,其中推定外显子由GENSCAN和TWINSCAN预测。所提出的方法能够分别将外显子特异性提高73%和32%,而灵敏度损失≤1%。当去除26%位于非编码区域的预测基因时,该方法还保留了TWINSCAN正确预测的98%的RefSeq基因结构。TWINSCAN预测中真实外显子的估计数量为157,070。结果和可执行代码可从http://www.stat.purdue.edu/~jingwu/codon/下载。

结论

所提出的方法展示了进化保守原则在编码外显子上的应用。它是一种补充方法,可作为完善许多现有基因预测的附加标准。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6abd/2559877/9eb4f9dc7f0c/1471-2164-9-S2-S13-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验