Fontana Paolo, Cestaro Alessandro, Velasco Riccardo, Formentin Elide, Toppo Stefano
FEM-IASMA Research Center, San Michele all'Adige (TN), Italy.
PLoS One. 2009;4(2):e4619. doi: 10.1371/journal.pone.0004619. Epub 2009 Feb 27.
Large-scale sequencing projects have now become routine lab practice and this has led to the development of a new generation of tools involving function prediction methods, bringing the latter back to the fore. The advent of Gene Ontology, with its structured vocabulary and paradigm, has provided computational biologists with an appropriate means for this task.
We present here a novel method called ARGOT (Annotation Retrieval of Gene Ontology Terms) that is able to process quickly thousands of sequences for functional inference. The tool exploits for the first time an integrated approach which combines clustering of GO terms, based on their semantic similarities, with a weighting scheme which assesses retrieved hits sharing a certain number of biological features with the sequence to be annotated. These hits may be obtained by different methods and in this work we have based ARGOT processing on BLAST results.
The extensive benchmark involved 10,000 protein sequences, the complete S. cerevisiae genome and a small subset of proteins for purposes of comparison with other available tools. The algorithm was proven to outperform existing methods and to be suitable for function prediction of single proteins due to its high degree of sensitivity, specificity and coverage.
大规模测序项目如今已成为常规实验室操作,这促使了新一代工具的开发,其中包括功能预测方法,使得后者再次受到关注。基因本体论(Gene Ontology)及其结构化词汇和范式的出现,为计算生物学家完成这项任务提供了合适的手段。
我们在此介绍一种名为ARGOT(基因本体术语注释检索)的新方法,它能够快速处理数千个序列以进行功能推断。该工具首次采用了一种综合方法,该方法将基于语义相似性的基因本体术语聚类与一种加权方案相结合,该加权方案评估与待注释序列共享一定数量生物学特征的检索到的匹配项。这些匹配项可以通过不同方法获得,在本研究中,我们基于BLAST结果进行ARGOT处理。
广泛的基准测试涉及10,000个蛋白质序列、完整的酿酒酵母基因组以及一小部分蛋白质,以便与其他现有工具进行比较。该算法被证明优于现有方法,并且由于其高度的敏感性、特异性和覆盖率,适用于单个蛋白质的功能预测。