Suppr超能文献

利用基因本体论中的语义相似性和加权方案对来自基因组计划的匿名序列进行快速注释。

Rapid annotation of anonymous sequences from genome projects using semantic similarities and a weighting scheme in gene ontology.

作者信息

Fontana Paolo, Cestaro Alessandro, Velasco Riccardo, Formentin Elide, Toppo Stefano

机构信息

FEM-IASMA Research Center, San Michele all'Adige (TN), Italy.

出版信息

PLoS One. 2009;4(2):e4619. doi: 10.1371/journal.pone.0004619. Epub 2009 Feb 27.

Abstract

BACKGROUND

Large-scale sequencing projects have now become routine lab practice and this has led to the development of a new generation of tools involving function prediction methods, bringing the latter back to the fore. The advent of Gene Ontology, with its structured vocabulary and paradigm, has provided computational biologists with an appropriate means for this task.

METHODOLOGY

We present here a novel method called ARGOT (Annotation Retrieval of Gene Ontology Terms) that is able to process quickly thousands of sequences for functional inference. The tool exploits for the first time an integrated approach which combines clustering of GO terms, based on their semantic similarities, with a weighting scheme which assesses retrieved hits sharing a certain number of biological features with the sequence to be annotated. These hits may be obtained by different methods and in this work we have based ARGOT processing on BLAST results.

CONCLUSIONS

The extensive benchmark involved 10,000 protein sequences, the complete S. cerevisiae genome and a small subset of proteins for purposes of comparison with other available tools. The algorithm was proven to outperform existing methods and to be suitable for function prediction of single proteins due to its high degree of sensitivity, specificity and coverage.

摘要

背景

大规模测序项目如今已成为常规实验室操作,这促使了新一代工具的开发,其中包括功能预测方法,使得后者再次受到关注。基因本体论(Gene Ontology)及其结构化词汇和范式的出现,为计算生物学家完成这项任务提供了合适的手段。

方法

我们在此介绍一种名为ARGOT(基因本体术语注释检索)的新方法,它能够快速处理数千个序列以进行功能推断。该工具首次采用了一种综合方法,该方法将基于语义相似性的基因本体术语聚类与一种加权方案相结合,该加权方案评估与待注释序列共享一定数量生物学特征的检索到的匹配项。这些匹配项可以通过不同方法获得,在本研究中,我们基于BLAST结果进行ARGOT处理。

结论

广泛的基准测试涉及10,000个蛋白质序列、完整的酿酒酵母基因组以及一小部分蛋白质,以便与其他现有工具进行比较。该算法被证明优于现有方法,并且由于其高度的敏感性、特异性和覆盖率,适用于单个蛋白质的功能预测。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9928/2645684/24068f88701d/pone.0004619.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验