• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用比较基因组学提高外显子预测的特异性。

Improving the specificity of exon prediction using comparative genomics.

作者信息

Wu Jing

机构信息

Department of Statistics, Purdue University, 150 N, University Street, West Lafayette, IN 47906, USA.

出版信息

BMC Genomics. 2008 Sep 16;9 Suppl 2(Suppl 2):S13. doi: 10.1186/1471-2164-9-S2-S13.

DOI:10.1186/1471-2164-9-S2-S13
PMID:18831778
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2559877/
Abstract

BACKGROUND

Computational gene prediction tools routinely generate large volumes of predicted coding exons (putative exons). One common limitation of these tools is the relatively low specificity due to the large amount of non-coding regions.

METHODS

A statistical approach is developed that largely improves the gene prediction specificity. The key idea is to utilize the evolutionary conservation principle relative to the coding exons. By first exploiting the homology between genomes of two related species, a probability model for the evolutionary conservation pattern of codons across different genomes is developed. A probability model for the dependency between adjacent codons/triplets is added to differentiate coding exons and random sequences. Finally, the log odds ratio is developed to classify putative exons into the group of coding exons and the group of non-coding regions.

RESULTS

The method was tested on pre-aligned human-mouse sequences where the putative exons are predicted by GENSCAN and TWINSCAN. The proposed method is able to improve the exon specificity by 73% and 32% respectively, while the loss of the sensitivity < or = 1%. The method also keeps 98% of RefSeq gene structures that are correctly predicted by TWINSCAN when removing 26% of predicted genes that are in non-coding regions. The estimated number of true exons in TWINSCAN's predictions is 157,070. The results and the executable codes can be downloaded from http://www.stat.purdue.edu/~jingwu/codon/

CONCLUSION

The proposed method demonstrates an application of the evolutionary conservation principle to coding exons. It is a complementary method which can be used as an additional criteria to refine many existing gene predictions.

摘要

背景

计算基因预测工具通常会生成大量预测的编码外显子(推定外显子)。这些工具的一个常见局限性是由于非编码区域数量众多,导致特异性相对较低。

方法

开发了一种统计方法,该方法在很大程度上提高了基因预测的特异性。关键思想是利用相对于编码外显子的进化保守原则。首先通过利用两个相关物种基因组之间的同源性,开发了一个跨不同基因组密码子进化保守模式的概率模型。添加了相邻密码子/三联体之间依赖性的概率模型,以区分编码外显子和随机序列。最后,开发对数优势比,将推定外显子分类为编码外显子组和非编码区域组。

结果

该方法在预先比对的人类 - 小鼠序列上进行了测试,其中推定外显子由GENSCAN和TWINSCAN预测。所提出的方法能够分别将外显子特异性提高73%和32%,而灵敏度损失≤1%。当去除26%位于非编码区域的预测基因时,该方法还保留了TWINSCAN正确预测的98%的RefSeq基因结构。TWINSCAN预测中真实外显子的估计数量为157,070。结果和可执行代码可从http://www.stat.purdue.edu/~jingwu/codon/下载。

结论

所提出的方法展示了进化保守原则在编码外显子上的应用。它是一种补充方法,可作为完善许多现有基因预测的附加标准。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6abd/2559877/9eb4f9dc7f0c/1471-2164-9-S2-S13-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6abd/2559877/9eb4f9dc7f0c/1471-2164-9-S2-S13-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6abd/2559877/9eb4f9dc7f0c/1471-2164-9-S2-S13-1.jpg

相似文献

1
Improving the specificity of exon prediction using comparative genomics.利用比较基因组学提高外显子预测的特异性。
BMC Genomics. 2008 Sep 16;9 Suppl 2(Suppl 2):S13. doi: 10.1186/1471-2164-9-S2-S13.
2
Integrating genomic homology into gene structure prediction.将基因组同源性整合到基因结构预测中。
Bioinformatics. 2001;17 Suppl 1:S140-8. doi: 10.1093/bioinformatics/17.suppl_1.s140.
3
Using ESTs to improve the accuracy of de novo gene prediction.利用表达序列标签提高从头基因预测的准确性。
BMC Bioinformatics. 2006 Jul 3;7:327. doi: 10.1186/1471-2105-7-327.
4
Coding exon detection using comparative sequences.利用比较序列检测编码外显子
J Comput Biol. 2006 Jul-Aug;13(6):1148-64. doi: 10.1089/cmb.2006.13.1148.
5
GeneAlign: a coding exon prediction tool based on phylogenetical comparisons.基因比对:一种基于系统发育比较的编码外显子预测工具。
Nucleic Acids Res. 2006 Jul 1;34(Web Server issue):W280-4. doi: 10.1093/nar/gkl307.
6
Computational discovery of human coding and non-coding transcripts with conserved splice sites.具有保守剪接位点的人类编码和非编码转录本的计算发现。
Bioinformatics. 2011 Jul 15;27(14):1894-900. doi: 10.1093/bioinformatics/btr314. Epub 2011 May 26.
7
Computational identification of protein coding potential of conserved sequence tags through cross-species evolutionary analysis.通过跨物种进化分析对保守序列标签的蛋白质编码潜力进行计算鉴定。
Nucleic Acids Res. 2003 Aug 1;31(15):4639-45. doi: 10.1093/nar/gkg483.
8
Coding exon-structure aware realigner (CESAR) utilizes genome alignments for accurate comparative gene annotation.编码外显子结构感知重排器(CESAR)利用基因组比对进行准确的比较基因注释。
Nucleic Acids Res. 2016 Jun 20;44(11):e103. doi: 10.1093/nar/gkw210. Epub 2016 Mar 25.
9
Identify alternative splicing events based on position-specific evolutionary conservation.基于位置特异性进化保守性识别可变剪接事件。
PLoS One. 2008 Jul 30;3(7):e2806. doi: 10.1371/journal.pone.0002806.
10
Gene structure conservation aids similarity based gene prediction.基因结构保守性有助于基于相似性的基因预测。
Nucleic Acids Res. 2004 Feb 4;32(2):776-83. doi: 10.1093/nar/gkh211. Print 2004.

引用本文的文献

1
Genomics, molecular imaging, bioinformatics, and bio-nano-info integration are synergistic components of translational medicine and personalized healthcare research.基因组学、分子成像、生物信息学以及生物纳米信息整合是转化医学和个性化医疗研究的协同组成部分。
BMC Genomics. 2008 Sep 16;9 Suppl 2(Suppl 2):I1. doi: 10.1186/1471-2164-9-S2-I1.

本文引用的文献

1
Coding exon detection using comparative sequences.利用比较序列检测编码外显子
J Comput Biol. 2006 Jul-Aug;13(6):1148-64. doi: 10.1089/cmb.2006.13.1148.
2
NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins.NCBI参考序列(RefSeq):一个经过整理的基因组、转录本和蛋白质的非冗余序列数据库。
Nucleic Acids Res. 2005 Jan 1;33(Database issue):D501-4. doi: 10.1093/nar/gki025.
3
SLAM web server for comparative gene finding and alignment.用于比较基因发现和比对的SLAM网络服务器。
Nucleic Acids Res. 2003 Jul 1;31(13):3507-9. doi: 10.1093/nar/gkg583.
4
An evolutionary approach reveals a high protein-coding capacity of the human genome.一种进化方法揭示了人类基因组的高蛋白质编码能力。
Trends Genet. 2003 Jun;19(6):306-10. doi: 10.1016/S0168-9525(03)00114-8.
5
Comparison of mouse and human genomes followed by experimental verification yields an estimated 1,019 additional genes.对小鼠和人类基因组进行比较,随后进行实验验证,结果发现约有1019个额外基因。
Proc Natl Acad Sci U S A. 2003 Feb 4;100(3):1140-5. doi: 10.1073/pnas.0337561100. Epub 2003 Jan 27.
6
Human-mouse alignments with BLASTZ.使用BLASTZ进行人-小鼠序列比对。
Genome Res. 2003 Jan;13(1):103-7. doi: 10.1101/gr.809403.
7
Initial sequencing and comparative analysis of the mouse genome.小鼠基因组的初步测序与比较分析。
Nature. 2002 Dec 5;420(6915):520-62. doi: 10.1038/nature01262.
8
The K(A)/K(S) ratio test for assessing the protein-coding potential of genomic regions: an empirical and simulation study.用于评估基因组区域蛋白质编码潜力的K(A)/K(S)比率测试:一项实证与模拟研究。
Genome Res. 2002 Jan;12(1):198-202. doi: 10.1101/gr.200901.
9
Integrating genomic homology into gene structure prediction.将基因组同源性整合到基因结构预测中。
Bioinformatics. 2001;17 Suppl 1:S140-8. doi: 10.1093/bioinformatics/17.suppl_1.s140.
10
The conserved exon method for gene finding.用于基因发现的保守外显子方法。
Proc Int Conf Intell Syst Mol Biol. 2000;8:3-12.