Suppr超能文献

利用系统发育谱比较发现功能联系和未表征的细胞途径:一项综合评估。

Discovering functional linkages and uncharacterized cellular pathways using phylogenetic profile comparisons: a comprehensive assessment.

作者信息

Jothi Raja, Przytycka Teresa M, Aravind L

机构信息

National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.

出版信息

BMC Bioinformatics. 2007 May 23;8:173. doi: 10.1186/1471-2105-8-173.

Abstract

BACKGROUND

A widely-used approach for discovering functional and physical interactions among proteins involves phylogenetic profile comparisons (PPCs). Here, proteins with similar profiles are inferred to be functionally related under the assumption that proteins involved in the same metabolic pathway or cellular system are likely to have been co-inherited during evolution.

RESULTS

Our experimentation with E. coli and yeast proteins with 16 different carefully composed reference sets of genomes revealed that the phyletic patterns of proteins in prokaryotes alone could be adequate enough to make reasonably accurate functional linkage predictions. A slight improvement in performance is observed on adding few eukaryotes into the reference set, but a noticeable drop-off in performance is observed with increased number of eukaryotes. Inclusion of most parasitic, pathogenic or vertebrate genomes and multiple strains of the same species into the reference set do not necessarily contribute to an improved sensitivity or accuracy. Interestingly, we also found that evolutionary histories of individual pathways have a significant affect on the performance of the PPC approach with respect to a particular reference set. For example, to accurately predict functional links in carbohydrate or lipid metabolism, a reference set solely composed of prokaryotic (or bacterial) genomes performed among the best compared to one composed of genomes from all three super-kingdoms; this is in contrast to predicting functional links in translation for which a reference set composed of prokaryotic (or bacterial) genomes performed the worst. We also demonstrate that the widely used random null model to quantify the statistical significance of profile similarity is incomplete, which could result in an increased number of false-positives.

CONCLUSION

Contrary to previous proposals, it is not merely the number of genomes but a careful selection of informative genomes in the reference set that influences the prediction accuracy of the PPC approach. We note that the predictive power of the PPC approach, especially in eukaryotes, is heavily influenced by the primary endosymbiosis and subsequent bacterial contributions. The over-representation of parasitic unicellular eukaryotes and vertebrates additionally make eukaryotes less useful in the reference sets. Reference sets composed of highly non-redundant set of genomes from all three super-kingdoms fare better with pathways showing considerable vertical inheritance and strong conservation (e.g. translation apparatus), while reference sets solely composed of prokaryotic genomes fare better for more variable pathways like carbohydrate metabolism. Differential performance of the PPC approach on various pathways, and a weak positive correlation between functional and profile similarities suggest that caution should be exercised while interpreting functional linkages inferred from genome-wide large-scale profile comparisons using a single reference set.

摘要

背景

一种广泛用于发现蛋白质之间功能和物理相互作用的方法涉及系统发育谱比较(PPC)。在此方法中,具有相似谱的蛋白质被推断为功能相关,其假设是参与相同代谢途径或细胞系统的蛋白质在进化过程中可能是共同遗传的。

结果

我们使用大肠杆菌和酵母蛋白质以及16个精心构建的不同基因组参考集进行实验,结果表明仅原核生物蛋白质的系统发育模式就足以做出相当准确的功能联系预测。在参考集中加入少量真核生物时性能略有提升,但随着真核生物数量增加,性能会显著下降。将大多数寄生、致病或脊椎动物基因组以及同一物种 的多个菌株纳入参考集并不一定会提高灵敏度或准确性。有趣的是,我们还发现单个途径的进化历史对PPC方法针对特定参考集的性能有重大影响。例如,为了准确预测碳水化合物或脂质代谢中的功能联系,与由所有三个超界的基因组组成的参考集相比,仅由原核(或细菌)基因组组成的参考集表现最佳;这与预测翻译中的功能联系相反,对于翻译,由原核(或细菌)基因组组成的参考集表现最差。我们还证明,广泛使用的用于量化谱相似性统计显著性的随机零模型是不完整的,这可能导致假阳性数量增加。

结论

与先前的提议相反,影响PPC方法预测准确性的不仅仅是基因组数量,还在于参考集中信息丰富的基因组的精心选择。我们注意到,PPC方法 的预测能力,尤其是在真核生物中,受到初级内共生和随后细菌贡献的严重影响。寄生单细胞真核生物和脊椎动物的过度代表性使得真核生物在参考集中的作用较小。由来自所有三个超界的高度非冗余基因组集组成的参考集对于显示出相当垂直遗传和强保守性的途径(如翻译装置)表现更好,而仅由原核基因组组成的参考集对于像碳水化合物代谢这样更具变异性的途径表现更好。PPC方法在各种途径上的不同性能,以及功能和谱相似性之间的弱正相关表明,在解释使用单个参考集从全基因组大规模谱比较中推断出的功能联系时应谨慎。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d03f/1904249/414b773e0bd1/1471-2105-8-173-1.jpg

相似文献

2
Comparative assessment of performance and genome dependence among phylogenetic profiling methods.
BMC Bioinformatics. 2006 Sep 27;7:420. doi: 10.1186/1471-2105-7-420.
3
An improved method for identifying functionally linked proteins using phylogenetic profiles.
BMC Bioinformatics. 2007 May 22;8 Suppl 4(Suppl 4):S7. doi: 10.1186/1471-2105-8-S4-S7.
4
Predicting protein linkages in bacteria: which method is best depends on task.
BMC Bioinformatics. 2008 Sep 24;9:397. doi: 10.1186/1471-2105-9-397.
5
A comprehensive evolutionary classification of proteins encoded in complete eukaryotic genomes.
Genome Biol. 2004;5(2):R7. doi: 10.1186/gb-2004-5-2-r7. Epub 2004 Jan 15.
6
Discovery of uncharacterized cellular systems by genome-wide analysis of functional linkages.
Nat Biotechnol. 2003 Sep;21(9):1055-62. doi: 10.1038/nbt861. Epub 2003 Aug 17.
7
Using phylogeny to improve genome-wide distant homology recognition.
PLoS Comput Biol. 2007 Jan 19;3(1):e3. doi: 10.1371/journal.pcbi.0030003. Epub 2006 Nov 20.
9
10
Towards validating the hypothesis of phylogenetic profiling.
BMC Bioinformatics. 2007 Nov 1;8 Suppl 7(Suppl 7):S25. doi: 10.1186/1471-2105-8-S7-S25.

引用本文的文献

1
ProTaxoVis-protein taxonomic visualisation of presence.
BMC Bioinformatics. 2025 May 19;26(1):128. doi: 10.1186/s12859-025-06146-9.
2
Assembling bacterial puzzles: piecing together functions into microbial pathways.
NAR Genom Bioinform. 2024 Aug 24;6(3):lqae109. doi: 10.1093/nargab/lqae109. eCollection 2024 Sep.
6
CladeOScope: functional interactions through the prism of clade-wise co-evolution.
NAR Genom Bioinform. 2021 Apr 20;3(2):lqab024. doi: 10.1093/nargab/lqab024. eCollection 2021 Jun.
7
Scalable phylogenetic profiling using MinHash uncovers likely eukaryotic sexual reproduction genes.
PLoS Comput Biol. 2020 Jul 22;16(7):e1007553. doi: 10.1371/journal.pcbi.1007553. eCollection 2020 Jul.
8
The evolutionary signal in metagenome phyletic profiles predicts many gene functions.
Microbiome. 2018 Jul 10;6(1):129. doi: 10.1186/s40168-018-0506-4.
9
Modules of co-occurrence in the cyanobacterial pan-genome reveal functional associations between groups of ortholog genes.
PLoS Genet. 2018 Mar 9;14(3):e1007239. doi: 10.1371/journal.pgen.1007239. eCollection 2018 Mar.
10
Solar-panel and parasol strategies shape the proteorhodopsin distribution pattern in marine Flavobacteriia.
ISME J. 2018 May;12(5):1329-1343. doi: 10.1038/s41396-018-0058-4. Epub 2018 Feb 6.

本文引用的文献

1
Predicting protein domain interactions from coevolution of conserved regions.
Proteins. 2007 Jun 1;67(4):811-20. doi: 10.1002/prot.21347.
2
Comparative assessment of performance and genome dependence among phylogenetic profiling methods.
BMC Bioinformatics. 2006 Sep 27;7:420. doi: 10.1186/1471-2105-7-420.
5
The outcomes of pathway database computations depend on pathway ontology.
Nucleic Acids Res. 2006 Aug 7;34(13):3687-97. doi: 10.1093/nar/gkl438. Print 2006.
6
Roundup: a multi-genome repository of orthologs and evolutionary distances.
Bioinformatics. 2006 Aug 15;22(16):2044-6. doi: 10.1093/bioinformatics/btl286. Epub 2006 Jun 15.
7
Inferring functional linkages between proteins from evolutionary scenarios.
J Mol Biol. 2006 Jun 16;359(4):1150-9. doi: 10.1016/j.jmb.2006.04.011. Epub 2006 Apr 24.
8
Global landscape of protein complexes in the yeast Saccharomyces cerevisiae.
Nature. 2006 Mar 30;440(7084):637-43. doi: 10.1038/nature04670. Epub 2006 Mar 22.
9
Extraction of phylogenetic network modules from the metabolic network.
BMC Bioinformatics. 2006 Mar 13;7:130. doi: 10.1186/1471-2105-7-130.
10
Gene annotation and network inference by phylogenetic profiling.
BMC Bioinformatics. 2006 Feb 17;7:80. doi: 10.1186/1471-2105-7-80.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验