• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用系统发育谱鉴定基因之间的功能联系。

Identification of functional links between genes using phylogenetic profiles.

作者信息

Wu Jie, Kasif Simon, DeLisi Charles

机构信息

Department of Biomedical Engineering, USA Bioinformatics Graduate Program, Boston University, 44 Cummington St., Boston, MA, 02215, USA.

出版信息

Bioinformatics. 2003 Aug 12;19(12):1524-30. doi: 10.1093/bioinformatics/btg187.

DOI:10.1093/bioinformatics/btg187
PMID:12912833
Abstract

MOTIVATION

Genes with identical patterns of occurrence across the phyla tend to function together in the same protein complexes or participate in the same biochemical pathway. However, the requirement that the profiles be identical (i) severely restricts the number of functional links that can be established by such phylogenetic profiling; (ii) limits detection to very strong functional links, failing to capture relations between genes that are not in the same pathway, but nevertheless subserve a common function and (iii) misses relations between analogous genes. Here we present and apply a method for relaxing the restriction, based on the probability that a given arbitrary degree of similarity between two profiles would occur by chance, with no biological pressure. Function is then inferred at any desired level of confidence.

RESULTS

We derive an expression for the probability distribution of a given number of chance co-occurrences of a pair of non-homologous orthologs across a set of genomes. The method is applied to 2905 clusters of orthologous genes (COGs) from 44 fully sequenced microbial genomes representing all three domains of life. Among the results are the following. (1) Of the 51 000 annotated intrapathway gene pairs, 8935 are linked at a level of significance of 0.01. This is over 30-fold greater than the 271 intrapathway pairs obtained at the same confidence level when identical profiles are used. (2) Of the 540 000 interpathway genes pairs, some 65 000 are linked at the 0.01 level of significance, some 12 standard deviations beyond the number expected by chance at this confidence level. We speculate that many of these links involve nearest-neighbor path, and discuss some examples. (3) The difference in the percentage of linked interpathway and intrapathway genes is highly significant, consistent with the intuitive expectation that genes in the same pathway are generally under greater selective pressure than those that are not. (4) The method appears to recover well metabolic networks. This is illustrated by the TCA cycle which is recovered as a highly connected, weighted edge network of 30 of its 31 COGs. (5) The fraction of pairs having a common pathway is a symmetric function of the Hamming distance between their profiles. This finding, that the functional correlation between profiles with near maximum Hamming distance is as large as between profiles with near zero Hamming distance, and as statistically significant, is plausibly explained if the former group represents analogous genes.

摘要

动机

在不同门类中具有相同出现模式的基因往往在同一蛋白质复合物中共同发挥作用,或参与同一生化途径。然而,要求图谱完全相同(i)严重限制了通过这种系统发育图谱分析能够建立的功能联系数量;(ii)将检测局限于非常强的功能联系,无法捕捉不在同一途径但仍共同服务于一个共同功能的基因之间的关系;(iii)遗漏了类似基因之间的关系。在此,我们提出并应用一种方法来放宽这种限制,该方法基于两个图谱之间给定任意程度的相似性偶然出现(即没有生物学压力)的概率。然后可以在任何期望的置信水平下推断功能。

结果

我们推导出了一对非同源直系同源基因在一组基因组中偶然共出现给定次数的概率分布表达式。该方法应用于来自代表生命所有三个域的44个全测序微生物基因组的2905个直系同源基因簇(COG)。结果如下:(1)在51000个注释的途径内基因对中,有8935对在显著性水平为0.01时存在联系。这比使用相同图谱在相同置信水平下得到的271个途径内基因对多了30多倍。(2)在540000个途径间基因对中,约65000对在显著性水平为0.01时存在联系,比在该置信水平下偶然预期的数量超出约12个标准差。我们推测这些联系中有许多涉及最近邻途径,并讨论了一些例子。(3)途径间和途径内有联系的基因百分比差异非常显著,这与直观预期一致,即同一途径中的基因通常比不在同一途径中的基因受到更大的选择压力。(4)该方法似乎能很好地恢复代谢网络。以三羧酸循环为例,它被恢复为其31个COG中的30个组成的高度连通的加权边网络。(5)具有共同途径的基因对比例是其图谱之间汉明距离的对称函数。如果前一组代表类似基因,那么这个发现,即汉明距离接近最大值的图谱之间的功能相关性与汉明距离接近零的图谱之间的功能相关性一样大且具有统计学显著性,就可以得到合理地解释。

相似文献

1
Identification of functional links between genes using phylogenetic profiles.利用系统发育谱鉴定基因之间的功能联系。
Bioinformatics. 2003 Aug 12;19(12):1524-30. doi: 10.1093/bioinformatics/btg187.
2
An improved method for identifying functionally linked proteins using phylogenetic profiles.一种利用系统发育谱识别功能相关蛋白质的改进方法。
BMC Bioinformatics. 2007 May 22;8 Suppl 4(Suppl 4):S7. doi: 10.1186/1471-2105-8-S4-S7.
3
Distribution of words with a predefined range of mismatches to a DNA probe in bacterial genomes.细菌基因组中与DNA探针错配范围预定义的单词分布。
Bioinformatics. 2004 Jan 1;20(1):67-74. doi: 10.1093/bioinformatics/btg374.
4
Noise sampling method: an ANOVA approach allowing robust selection of differentially regulated genes measured by DNA microarrays.噪声采样方法:一种方差分析方法,可用于通过DNA微阵列测量的差异调节基因的稳健选择。
Bioinformatics. 2003 Jul 22;19(11):1348-59. doi: 10.1093/bioinformatics/btg165.
5
A duplication growth model of gene expression networks.基因表达网络的复制增长模型。
Bioinformatics. 2002 Nov;18(11):1486-93. doi: 10.1093/bioinformatics/18.11.1486.
6
Combining phylogenetic data with co-regulated genes to identify regulatory motifs.结合系统发育数据与共调控基因以识别调控基序。
Bioinformatics. 2003 Dec 12;19(18):2369-80. doi: 10.1093/bioinformatics/btg329.
7
Using shared genomic synteny and shared protein functions to enhance the identification of orthologous gene pairs.利用共享基因组同线性和共享蛋白质功能来加强直系同源基因对的识别。
Bioinformatics. 2005 Mar;21(6):703-10. doi: 10.1093/bioinformatics/bti045. Epub 2004 Sep 30.
8
Comparisons and validation of statistical clustering techniques for microarray gene expression data.微阵列基因表达数据统计聚类技术的比较与验证
Bioinformatics. 2003 Mar 1;19(4):459-66. doi: 10.1093/bioinformatics/btg025.
9
Improving genome annotations using phylogenetic profile anomaly detection.利用系统发育谱异常检测改进基因组注释。
Bioinformatics. 2005 Feb 15;21(4):464-70. doi: 10.1093/bioinformatics/bti027. Epub 2004 Sep 16.
10
Discovering functional linkages and uncharacterized cellular pathways using phylogenetic profile comparisons: a comprehensive assessment.利用系统发育谱比较发现功能联系和未表征的细胞途径:一项综合评估。
BMC Bioinformatics. 2007 May 23;8:173. doi: 10.1186/1471-2105-8-173.

引用本文的文献

1
Assembling bacterial puzzles: piecing together functions into microbial pathways.组装细菌谜题:将功能拼凑成微生物途径。
NAR Genom Bioinform. 2024 Aug 24;6(3):lqae109. doi: 10.1093/nargab/lqae109. eCollection 2024 Sep.
2
New feature extraction from phylogenetic profiles improved the performance of pathogen-host interactions.从系统发育轮廓中提取新特征可提高病原体-宿主相互作用的性能。
Front Cell Infect Microbiol. 2022 Aug 2;12:931072. doi: 10.3389/fcimb.2022.931072. eCollection 2022.
3
The Community Coevolution Model with Application to the Study of Evolutionary Relationships between Genes Based on Phylogenetic Profiles.
基于系统发生轮廓的基因进化关系研究中的社区协同进化模型
Syst Biol. 2023 Jun 17;72(3):559-574. doi: 10.1093/sysbio/syac052.
4
Computational Network Inference for Bacterial Interactomics.计算网络推断在细菌相互作用组学中的应用。
mSystems. 2022 Apr 26;7(2):e0145621. doi: 10.1128/msystems.01456-21. Epub 2022 Mar 30.
5
GFICLEE: ultrafast tree-based phylogenetic profile method inferring gene function at the genomic-wide level.GFICLEE:基于树的快速系统发育特征分析方法,用于在全基因组水平上推断基因功能。
BMC Genomics. 2021 Oct 29;22(1):774. doi: 10.1186/s12864-021-08070-7.
6
Performance improvement for a 2D convolutional neural network by using SSC encoding on protein-protein interaction tasks.利用 SSC 编码提高二维卷积神经网络在蛋白质相互作用任务上的性能。
BMC Bioinformatics. 2021 Apr 12;22(1):184. doi: 10.1186/s12859-021-04111-w.
7
The molecular basis of monopolin recruitment to the kinetochore.单极纺锤体蛋白(monopolin)定位于动粒的分子基础。
Chromosoma. 2019 Sep;128(3):331-354. doi: 10.1007/s00412-019-00700-0. Epub 2019 Apr 30.
8
PrePhyloPro: phylogenetic profile-based prediction of whole proteome linkages.PrePhyloPro:基于系统发育谱的全蛋白质组关联预测。
PeerJ. 2017 Aug 28;5:e3712. doi: 10.7717/peerj.3712. eCollection 2017.
9
Evolutionary dynamics of the kinetochore network in eukaryotes as revealed by comparative genomics.比较基因组学揭示的真核生物动粒网络的进化动力学
EMBO Rep. 2017 Sep;18(9):1559-1571. doi: 10.15252/embr.201744102. Epub 2017 Jun 22.
10
Unique Phylogenetic Distributions of the Ska and Dam1 Complexes Support Functional Analogy and Suggest Multiple Parallel Displacements of Ska by Dam1.Ska和Dam1复合体独特的系统发育分布支持功能类比,并表明Dam1对Ska存在多次平行替代。
Genome Biol Evol. 2017 May 1;9(5):1295-1303. doi: 10.1093/gbe/evx088.