Llewellyn Richard, Eisenberg David S
Department of Energy Institute for Genomics and Proteomics, and Department of Chemistry and Biochemistry, University of California, Los Angeles, CA 90095, USA.
Proc Natl Acad Sci U S A. 2008 Nov 18;105(46):17700-5. doi: 10.1073/pnas.0809583105. Epub 2008 Nov 12.
As genome sequencing outstrips the rate of high-quality, low-throughput biochemical and genetic experimentation, accurate annotation of protein function becomes a bottleneck in the progress of the biomolecular sciences. Most gene products are now annotated by homology, in which an experimentally determined function is applied to a similar sequence. This procedure becomes error-prone between more divergent sequences and can contaminate biomolecular databases. Here, we propose a computational method of assignment of function, termed Generalized Functional Linkages (GFL), that combines nonhomology-based methods with other types of data. Functional linkages describe pairwise relationships between proteins that work together to perform a biological task. GFL provides a Bayesian framework that improves annotation by arbitrating a competition among biological process annotations to best describe the target protein. GFL addresses the unequal strengths of functional linkages among proteins, the quality of existing annotations, and the similarity among them while incorporating available knowledge about the cellular location or individual molecular function of the target protein. We demonstrate GFL with functional linkages defined by an algorithm known as zorch that quantifies connectivity in protein-protein interaction networks. Even when using proteins linked only by indirect or high-throughput interactions, GFL predicts the biological processes of many proteins in Saccharomyces cerevisiae, improving the accuracy of annotation by 20% over majority voting.
随着基因组测序的速度超过了高质量、低通量的生化和基因实验的速度,蛋白质功能的准确注释成为生物分子科学发展的瓶颈。目前,大多数基因产物是通过同源性进行注释的,即将实验确定的功能应用于相似的序列。在差异较大的序列之间,这个过程容易出错,并且可能会污染生物分子数据库。在这里,我们提出了一种功能分配的计算方法,称为广义功能联系(GFL),它将基于非同源性的方法与其他类型的数据相结合。功能联系描述了共同执行生物任务的蛋白质之间的成对关系。GFL提供了一个贝叶斯框架,通过在生物过程注释之间进行竞争仲裁,以最佳地描述目标蛋白质,从而改进注释。GFL解决了蛋白质之间功能联系强度不均、现有注释的质量以及它们之间的相似性问题,同时纳入了关于目标蛋白质的细胞定位或单个分子功能的现有知识。我们用一种名为zorch的算法定义的功能联系来演示GFL,该算法量化蛋白质-蛋白质相互作用网络中的连通性。即使使用仅通过间接或高通量相互作用连接的蛋白质,GFL也能预测酿酒酵母中许多蛋白质的生物过程,比多数投票法的注释准确率提高了20%。