King Oliver D, Foulger Rebecca E, Dwight Selina S, White James V, Roth Frederick P
Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, Massachusetts 02115, USA.
Genome Res. 2003 May;13(5):896-904. doi: 10.1101/gr.440803. Epub 2003 Apr 14.
The Gene Ontology (GO) Consortium has produced a controlled vocabulary for annotation of gene function that is used in many organism-specific gene annotation databases. This allows the prediction of gene function based on patterns of annotation. For example, if annotations for two attributes tend to occur together in a database, then a gene holding one attribute is likely to hold the other as well. We modeled the relationships among GO attributes with decision trees and Bayesian networks, using the annotations in the Saccharomyces Genome Database (SGD) and in FlyBase as training data. We tested the models using cross-validation, and we manually assessed 100 gene-attribute associations that were predicted by the models but that were not present in the SGD or FlyBase databases. Of the 100 manually assessed associations, 41 were judged to be true, and another 42 were judged to be plausible.
基因本体论(GO)联盟已经创建了一个用于注释基因功能的受控词汇表,该词汇表被许多特定生物体的基因注释数据库所使用。这使得基于注释模式来预测基因功能成为可能。例如,如果在一个数据库中,两个属性的注释倾向于同时出现,那么拥有其中一个属性的基因也很可能拥有另一个属性。我们使用酿酒酵母基因组数据库(SGD)和FlyBase中的注释作为训练数据,通过决策树和贝叶斯网络对GO属性之间的关系进行建模。我们使用交叉验证对模型进行测试,并手动评估了模型预测的但在SGD或FlyBase数据库中不存在的100个基因-属性关联。在这100个手动评估的关联中,41个被判定为真实的,另外42个被判定为合理的。