Bioinformatics Institute, University of Auckland, Auckland, New Zealand.
PLoS One. 2009 Dec 29;4(12):e8487. doi: 10.1371/journal.pone.0008487.
Recent studies have shown evidence for the coevolution of functionally-related genes. This coevolution is a result of constraints to maintain functional relationships between interacting proteins. The studies have focused on the correlation in gene tree branch lengths of proteins that are directly interacting with each other. We here hypothesize that the correlation in branch lengths is not limited only to proteins that directly interact, but also to proteins that operate within the same pathway. Using generalized linear models as a basis of identifying correlation, we attempted to predict the gene ontology (GO) terms of a gene based on its gene tree branch lengths. We applied our method to a dataset consisting of proteins from ten prokaryotic species. We found that the degree of accuracy to which we could predict the function of the proteins from their gene tree varied substantially with different GO terms. In particular, our model could accurately predict genes involved in translation and certain ribosomal activities with the area of the receiver-operator curve of up to 92%. Further analysis showed that the similarity between the trees of genes labeled with similar GO terms was not limited to genes that physically interacted, but also extended to genes functioning within the same pathway. We discuss the relevance of our findings as it relates to the use of phylogenetic methods in comparative genomics.
最近的研究表明,功能相关基因存在共同进化的证据。这种共同进化是由于相互作用的蛋白质之间保持功能关系的限制所致。这些研究集中在直接相互作用的蛋白质的基因树分支长度的相关性上。我们在这里假设,分支长度的相关性不仅限于直接相互作用的蛋白质,还包括在同一途径中起作用的蛋白质。我们使用广义线性模型作为识别相关性的基础,试图根据基因的基因树分支长度来预测基因的基因本体 (GO) 术语。我们将我们的方法应用于包含来自十个原核生物物种的蛋白质的数据集。我们发现,根据基因树分支长度预测蛋白质功能的准确性程度因不同的 GO 术语而有很大差异。具体来说,我们的模型可以准确预测与翻译和某些核糖体活性相关的基因,接收器-操作器曲线的面积高达 92%。进一步的分析表明,标记为相似 GO 术语的基因的树之间的相似性不仅限于物理相互作用的基因,而且还扩展到同一途径中起作用的基因。我们讨论了我们的发现与比较基因组学中使用系统发育方法的相关性。