Vilardell Mireia, Parra Genis, Civit Sergi
Department of Vertebrate Genomics, Max Planck Institute for Molecular Genetics, Ihnestraße 63-73, 14195 Berlin, Germany.
Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, 04103 Leipzig, Germany.
Biomed Res Int. 2014;2014:282343. doi: 10.1155/2014/282343. Epub 2014 Sep 15.
Classically, gene prediction programs are based on detecting signals such as boundary sites (splice sites, starts, and stops) and coding regions in the DNA sequence in order to build potential exons and join them into a gene structure. Although nowadays it is possible to improve their performance with additional information from related species or/and cDNA databases, further improvement at any step could help to obtain better predictions. Here, we present WISCOD, a web-enabled tool for the identification of significant protein coding regions, a novel software tool that tackles the exon prediction problem in eukaryotic genomes. WISCOD has the capacity to detect real exons from large lists of potential exons, and it provides an easy way to use global P value called expected probability of being a false exon (EPFE) that is useful for ranking potential exons in a probabilistic framework, without additional computational costs. The advantage of our approach is that it significantly increases the specificity and sensitivity (both between 80% and 90%) in comparison to other ab initio methods (where they are in the range of 70-75%). WISCOD is written in JAVA and R and is available to download and to run in a local mode on Linux and Windows platforms.
传统上,基因预测程序基于检测DNA序列中的边界位点(剪接位点、起始位点和终止位点)及编码区域等信号,以构建潜在外显子并将它们拼接成基因结构。尽管如今利用来自相关物种或/和cDNA数据库的额外信息可以提高其性能,但在任何步骤上的进一步改进都有助于获得更好的预测结果。在此,我们展示了WISCOD,一种用于识别重要蛋白质编码区域的基于网络的工具,这是一种解决真核生物基因组中外显子预测问题的新型软件工具。WISCOD能够从大量潜在外显子列表中检测出真正的外显子,并且它提供了一种简单的方法来使用称为假外显子预期概率(EPFE)的全局P值,该值在概率框架中对潜在外显子进行排序很有用,且无需额外的计算成本。我们方法的优势在于,与其他从头开始的方法相比(后者的特异性和敏感性在70 - 75%范围内),它显著提高了特异性和敏感性(均在80%至90%之间)。WISCOD用JAVA和R编写,可下载并在Linux和Windows平台上以本地模式运行。