Suppr超能文献

拼图:用于基因预测的多源证据整合

JIGSAW: integration of multiple sources of evidence for gene prediction.

作者信息

Allen Jonathan E, Salzberg Steven L

机构信息

Center for Bioinformatics and Computational Biology, University of Maryland Institute for Advanced Computer Studies, College Park, MD 20742, USA.

出版信息

Bioinformatics. 2005 Sep 15;21(18):3596-603. doi: 10.1093/bioinformatics/bti609. Epub 2005 Aug 2.

Abstract

MOTIVATION

Computational gene finding systems play an important role in finding new human genes, although no systems are yet accurate enough to predict all or even most protein-coding regions perfectly. Ab initio programs can be augmented by evidence such as expression data or protein sequence homology, which improves their performance. The amount of such evidence continues to grow, but computational methods continue to have difficulty predicting genes when the evidence is conflicting or incomplete. Genome annotation pipelines collect a variety of types of evidence about gene structure and synthesize the results, which can then be refined further through manual, expert curation of gene models.

RESULTS

JIGSAW is a new gene finding system designed to automate the process of predicting gene structure from multiple sources of evidence, with results that often match the performance of human curators. JIGSAW computes the relative weight of different lines of evidence using statistics generated from a training set, and then combines the evidence using dynamic programming. Our results show that JIGSAW's performance is superior to ab initio gene finding methods and to other pipelines such as Ensembl. Even without evidence from alignment to known genes, JIGSAW can substantially improve gene prediction accuracy as compared with existing methods.

AVAILABILITY

JIGSAW is available as an open source software package at http://cbcb.umd.edu/software/jigsaw.

摘要

动机

计算基因发现系统在寻找新的人类基因方面发挥着重要作用,尽管目前还没有系统能够精确到完美预测所有甚至大多数蛋白质编码区域。从头开始的程序可以通过诸如表达数据或蛋白质序列同源性等证据进行增强,这会提高它们的性能。此类证据的数量持续增长,但当证据相互矛盾或不完整时,计算方法在预测基因方面仍存在困难。基因组注释流程收集关于基因结构的各种类型的证据并综合结果,然后可以通过人工的、专家对基因模型的整理进一步完善。

结果

JIGSAW是一个新的基因发现系统,旨在自动从多种证据来源预测基因结构,其结果常常与人工整理的性能相匹配。JIGSAW使用从训练集生成的统计数据计算不同证据线的相对权重,然后使用动态规划组合证据。我们的结果表明,JIGSAW的性能优于从头开始的基因发现方法以及其他流程,如Ensembl。即使没有与已知基因比对的证据,与现有方法相比,JIGSAW也能大幅提高基因预测的准确性。

可用性

JIGSAW作为开源软件包可在http://cbcb.umd.edu/software/jigsaw获取。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验