Suppr超能文献

通过跨物种比较重新排序候选基因模型以改进基因预测。

Reranking candidate gene models with cross-species comparison for improved gene prediction.

作者信息

Liu Qian, Crammer Koby, Pereira Fernando C N, Roos David S

机构信息

Department of Computer and Information Science, University of Pennsylvania, Philadelphia, Pennsylvania, USA.

出版信息

BMC Bioinformatics. 2008 Oct 14;9:433. doi: 10.1186/1471-2105-9-433.

Abstract

BACKGROUND

Most gene finders score candidate gene models with state-based methods, typically HMMs, by combining local properties (coding potential, splice donor and acceptor patterns, etc). Competing models with similar state-based scores may be distinguishable with additional information. In particular, functional and comparative genomics datasets may help to select among competing models of comparable probability by exploiting features likely to be associated with the correct gene models, such as conserved exon/intron structure or protein sequence features.

RESULTS

We have investigated the utility of a simple post-processing step for selecting among a set of alternative gene models, using global scoring rules to rerank competing models for more accurate prediction. For each gene locus, we first generate the K best candidate gene models using the gene finder Evigan, and then rerank these models using comparisons with putative orthologous genes from closely-related species. Candidate gene models with lower scores in the original gene finder may be selected if they exhibit strong similarity to probable orthologs in coding sequence, splice site location, or signal peptide occurrence. Experiments on Drosophila melanogaster demonstrate that reranking based on cross-species comparison outperforms the best gene models identified by Evigan alone, and also outperforms the comparative gene finders GeneWise and Augustus+.

CONCLUSION

Reranking gene models with cross-species comparison improves gene prediction accuracy. This straightforward method can be readily adapted to incorporate additional lines of evidence, as it requires only a ranked source of candidate gene models.

摘要

背景

大多数基因预测工具通过结合局部特性(编码潜力、剪接供体和受体模式等),使用基于状态的方法(通常是隐马尔可夫模型)对候选基因模型进行评分。具有相似基于状态评分的竞争模型可能可以通过额外信息来区分。特别是,功能和比较基因组学数据集可能有助于通过利用可能与正确基因模型相关的特征(如保守的外显子/内含子结构或蛋白质序列特征)在具有可比概率的竞争模型中进行选择。

结果

我们研究了一种简单的后处理步骤在一组替代基因模型中进行选择的效用,使用全局评分规则对竞争模型重新排序以进行更准确的预测。对于每个基因座,我们首先使用基因预测工具Evigan生成K个最佳候选基因模型,然后通过与来自密切相关物种的假定直系同源基因进行比较对这些模型重新排序。如果候选基因模型在编码序列、剪接位点位置或信号肽出现方面与可能的直系同源物表现出强烈相似性,则可能会选择在原始基因预测工具中得分较低的候选基因模型。对黑腹果蝇的实验表明,基于跨物种比较的重新排序优于仅由Evigan识别的最佳基因模型,也优于比较基因预测工具GeneWise和Augustus+。

结论

通过跨物种比较对基因模型重新排序可提高基因预测准确性。这种直接的方法可以很容易地进行调整以纳入其他证据线索,因为它只需要一个候选基因模型的排序来源。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d618/2587481/de25b4191c70/1471-2105-9-433-1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验