Suppr超能文献

利用系统发育谱鉴定基因之间的功能联系。

Identification of functional links between genes using phylogenetic profiles.

作者信息

Wu Jie, Kasif Simon, DeLisi Charles

机构信息

Department of Biomedical Engineering, USA Bioinformatics Graduate Program, Boston University, 44 Cummington St., Boston, MA, 02215, USA.

出版信息

Bioinformatics. 2003 Aug 12;19(12):1524-30. doi: 10.1093/bioinformatics/btg187.

Abstract

MOTIVATION

Genes with identical patterns of occurrence across the phyla tend to function together in the same protein complexes or participate in the same biochemical pathway. However, the requirement that the profiles be identical (i) severely restricts the number of functional links that can be established by such phylogenetic profiling; (ii) limits detection to very strong functional links, failing to capture relations between genes that are not in the same pathway, but nevertheless subserve a common function and (iii) misses relations between analogous genes. Here we present and apply a method for relaxing the restriction, based on the probability that a given arbitrary degree of similarity between two profiles would occur by chance, with no biological pressure. Function is then inferred at any desired level of confidence.

RESULTS

We derive an expression for the probability distribution of a given number of chance co-occurrences of a pair of non-homologous orthologs across a set of genomes. The method is applied to 2905 clusters of orthologous genes (COGs) from 44 fully sequenced microbial genomes representing all three domains of life. Among the results are the following. (1) Of the 51 000 annotated intrapathway gene pairs, 8935 are linked at a level of significance of 0.01. This is over 30-fold greater than the 271 intrapathway pairs obtained at the same confidence level when identical profiles are used. (2) Of the 540 000 interpathway genes pairs, some 65 000 are linked at the 0.01 level of significance, some 12 standard deviations beyond the number expected by chance at this confidence level. We speculate that many of these links involve nearest-neighbor path, and discuss some examples. (3) The difference in the percentage of linked interpathway and intrapathway genes is highly significant, consistent with the intuitive expectation that genes in the same pathway are generally under greater selective pressure than those that are not. (4) The method appears to recover well metabolic networks. This is illustrated by the TCA cycle which is recovered as a highly connected, weighted edge network of 30 of its 31 COGs. (5) The fraction of pairs having a common pathway is a symmetric function of the Hamming distance between their profiles. This finding, that the functional correlation between profiles with near maximum Hamming distance is as large as between profiles with near zero Hamming distance, and as statistically significant, is plausibly explained if the former group represents analogous genes.

摘要

动机

在不同门类中具有相同出现模式的基因往往在同一蛋白质复合物中共同发挥作用,或参与同一生化途径。然而,要求图谱完全相同(i)严重限制了通过这种系统发育图谱分析能够建立的功能联系数量;(ii)将检测局限于非常强的功能联系,无法捕捉不在同一途径但仍共同服务于一个共同功能的基因之间的关系;(iii)遗漏了类似基因之间的关系。在此,我们提出并应用一种方法来放宽这种限制,该方法基于两个图谱之间给定任意程度的相似性偶然出现(即没有生物学压力)的概率。然后可以在任何期望的置信水平下推断功能。

结果

我们推导出了一对非同源直系同源基因在一组基因组中偶然共出现给定次数的概率分布表达式。该方法应用于来自代表生命所有三个域的44个全测序微生物基因组的2905个直系同源基因簇(COG)。结果如下:(1)在51000个注释的途径内基因对中,有8935对在显著性水平为0.01时存在联系。这比使用相同图谱在相同置信水平下得到的271个途径内基因对多了30多倍。(2)在540000个途径间基因对中,约65000对在显著性水平为0.01时存在联系,比在该置信水平下偶然预期的数量超出约12个标准差。我们推测这些联系中有许多涉及最近邻途径,并讨论了一些例子。(3)途径间和途径内有联系的基因百分比差异非常显著,这与直观预期一致,即同一途径中的基因通常比不在同一途径中的基因受到更大的选择压力。(4)该方法似乎能很好地恢复代谢网络。以三羧酸循环为例,它被恢复为其31个COG中的30个组成的高度连通的加权边网络。(5)具有共同途径的基因对比例是其图谱之间汉明距离的对称函数。如果前一组代表类似基因,那么这个发现,即汉明距离接近最大值的图谱之间的功能相关性与汉明距离接近零的图谱之间的功能相关性一样大且具有统计学显著性,就可以得到合理地解释。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验