Wu Jie, Mellor Joseph C, DeLisi Charles
Department of Biomedical Engineering, Boston University, 24 Cummington St. Boston, MA 02215, USA.
Genome Inform. 2005;16(1):142-9.
Phylogenetic profiling is now an effective computational method to detect functional associations between proteins. The method links two proteins in accordance with the similarity of their phyletic distributions across a set of genomes. While pair-wise linkage is useful, it misses correlations in higher order groups: triplets, quadruplets, and so on. Here we assess the probability of observing co-occurrence patterns of 3 binary profiles by chance and show that this probability is asymptotically the same as the mutual information in three profiles. We demonstrate the utility of the probability and the mutual information metrics in detecting overly represented triplets of orthologous proteins which could not be detected using pairwise profiles. These triplets serve as small building blocks, i.e. motifs in protein networks; they allow us to infer the function of uncharacterized members, and facilitate analysis of the local structure and global organization of the protein network. Our method is extendable to N-component clusters, and therefore serves as a general tool for high order protein function annotation.
系统发育谱分析如今是一种用于检测蛋白质之间功能关联的有效计算方法。该方法依据一组基因组中两种蛋白质的系统发育分布相似性来将它们联系起来。虽然成对关联很有用,但它会遗漏高阶组(三元组、四元组等)中的相关性。在这里,我们评估了偶然观察到3个二元谱共现模式的概率,并表明该概率渐近地等同于三个谱中的互信息。我们展示了该概率和互信息指标在检测使用成对谱无法检测到的直系同源蛋白质过度代表三元组方面的效用。这些三元组充当小的构建模块,即蛋白质网络中的基序;它们使我们能够推断未表征成员的功能,并有助于分析蛋白质网络的局部结构和全局组织。我们的方法可扩展到N组件簇,因此可作为高阶蛋白质功能注释的通用工具。