Computational and Functional Genomics Group, Centre for DNA Fingerprinting and Diagnostics, Hyderabad, India.
PLoS One. 2013;8(1):e54325. doi: 10.1371/journal.pone.0054325. Epub 2013 Jan 17.
Cellular activities are governed by the physical and the functional interactions among several proteins involved in various biological pathways. With the availability of sequenced genomes and high-throughput experimental data one can identify genome-wide protein-protein interactions using various computational techniques. Comparative assessments of these techniques in predicting protein interactions have been frequently reported in the literature but not their ability to elucidate a particular biological pathway.
Towards the goal of understanding the prediction capabilities of interactions among the specific biological pathway proteins, we report the analyses of 14 biological pathways of Escherichia coli catalogued in KEGG database using five protein-protein functional linkage prediction methods. These methods are phylogenetic profiling, gene neighborhood, co-presence of orthologous genes in the same gene clusters, a mirrortree variant, and expression similarity.
Our results reveal that the prediction of metabolic pathway protein interactions continues to be a challenging task for all methods which possibly reflect flexible/independent evolutionary histories of these proteins. These methods have predicted functional associations of proteins involved in amino acids, nucleotide, glycans and vitamins & co-factors pathways slightly better than the random performance on carbohydrate, lipid and energy metabolism. We also make similar observations for interactions involved among the environmental information processing proteins. On the contrary, genetic information processing or specialized processes such as motility related protein-protein linkages that occur in the subset of organisms are predicted with comparable accuracy. Metabolic pathways are best predicted by using neighborhood of orthologous genes whereas phyletic pattern is good enough to reconstruct central dogma pathway protein interactions. We have also shown that the effective use of a particular prediction method depends on the pathway under investigation. In case one is not focused on specific pathway, gene expression similarity method is the best option.
细胞活动受到涉及各种生物途径的几种蛋白质的物理和功能相互作用的控制。随着测序基因组和高通量实验数据的可用性,人们可以使用各种计算技术来识别全基因组蛋白质 - 蛋白质相互作用。这些技术在预测蛋白质相互作用方面的比较评估在文献中经常被报道,但它们在阐明特定生物途径方面的能力却没有得到报道。
为了了解特定生物途径蛋白质相互作用的预测能力,我们使用五种蛋白质 - 蛋白质功能连锁预测方法分析了 KEGG 数据库中列出的大肠杆菌的 14 种生物途径。这些方法是系统发育分析、基因邻居、同源基因在同一基因簇中的共存、mirrortree 变体和表达相似性。
我们的结果表明,所有方法预测代谢途径蛋白质相互作用仍然是一项具有挑战性的任务,这可能反映了这些蛋白质灵活/独立的进化历史。这些方法预测参与氨基酸、核苷酸、聚糖和维生素及辅因子途径的蛋白质的功能关联略好于碳水化合物、脂质和能量代谢的随机性能。我们还对环境信息处理蛋白相互作用的观察结果类似。相反,遗传信息处理或运动相关蛋白 - 蛋白连接等专门过程在生物体的子集中发生,预测的准确性相当。代谢途径最好通过同源基因的邻居来预测,而系统发育模式足以重建中心法则途径蛋白质相互作用。我们还表明,特定预测方法的有效使用取决于所研究的途径。如果不是专注于特定途径,基因表达相似性方法是最佳选择。