• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

系统发育谱分析:多少输入数据才足够?

Phylogenetic profiling: how much input data is enough?

作者信息

Škunca Nives, Dessimoz Christophe

机构信息

ETH Zürich, Department of Computer Science, Universitätstr. 19, 8092 Zürich, Switzerland; Swiss Institute of Bioinformatics, Universitätstr. 6, 8092 Zürich, Switzerland; University College London, Gower St, London WC1E 6BT, UK.

University College London, Gower St, London WC1E 6BT, UK; Swiss Institute of Bioinformatics, Universitätstr. 6, 8092 Zürich, Switzerland.

出版信息

PLoS One. 2015 Feb 13;10(2):e0114701. doi: 10.1371/journal.pone.0114701. eCollection 2015.

DOI:10.1371/journal.pone.0114701
PMID:25679783
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4332489/
Abstract

Phylogenetic profiling is a well-established approach for predicting gene function based on patterns of gene presence and absence across species. Much of the recent developments have focused on methodological improvements, but relatively little is known about the effect of input data size on the quality of predictions. In this work, we ask: how many genomes and functional annotations need to be considered for phylogenetic profiling to be effective? Phylogenetic profiling generally benefits from an increased amount of input data. However, by decomposing this improvement in predictive accuracy in terms of the contribution of additional genomes and of additional annotations, we observed diminishing returns in adding more than ∼ 100 genomes, whereas increasing the number of annotations remained strongly beneficial throughout. We also observed that maximising phylogenetic diversity within a clade of interest improves predictive accuracy, but the effect is small compared to changes in the number of genomes under comparison. Finally, we show that these findings are supported in light of the Open World Assumption, which posits that functional annotation databases are inherently incomplete. All the tools and data used in this work are available for reuse from http://lab.dessimoz.org/14_phylprof. Scripts used to analyse the data are available on request from the authors.

摘要

系统发育谱分析是一种基于物种间基因存在与否模式来预测基因功能的成熟方法。近期的许多进展都集中在方法改进上,但对于输入数据大小对预测质量的影响却知之甚少。在这项工作中,我们提出问题:为了使系统发育谱分析有效,需要考虑多少个基因组和功能注释?系统发育谱分析通常受益于输入数据量的增加。然而,通过从额外基因组和额外注释的贡献角度分解预测准确性的这种提高,我们观察到,增加超过约100个基因组时收益递减,而增加注释数量在整个过程中仍然非常有益。我们还观察到,在感兴趣的进化枝内最大化系统发育多样性可提高预测准确性,但与所比较基因组数量的变化相比,这种影响较小。最后,我们表明,根据开放世界假设,这些发现得到了支持,该假设认为功能注释数据库本质上是不完整的。这项工作中使用的所有工具和数据都可从http://lab.dessimoz.org/14_phylprof重复使用。用于分析数据的脚本可应作者要求提供。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e0d3/4332489/1d1f68a82080/pone.0114701.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e0d3/4332489/e50ed1bd770e/pone.0114701.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e0d3/4332489/b5a2e66c46a9/pone.0114701.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e0d3/4332489/4f3822a264c9/pone.0114701.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e0d3/4332489/064627ce440a/pone.0114701.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e0d3/4332489/1d1f68a82080/pone.0114701.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e0d3/4332489/e50ed1bd770e/pone.0114701.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e0d3/4332489/b5a2e66c46a9/pone.0114701.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e0d3/4332489/4f3822a264c9/pone.0114701.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e0d3/4332489/064627ce440a/pone.0114701.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e0d3/4332489/1d1f68a82080/pone.0114701.g005.jpg

相似文献

1
Phylogenetic profiling: how much input data is enough?系统发育谱分析:多少输入数据才足够?
PLoS One. 2015 Feb 13;10(2):e0114701. doi: 10.1371/journal.pone.0114701. eCollection 2015.
2
MycoBASE: expanding the functional annotation coverage of mycobacterial genomes.MycoBASE:扩大分枝杆菌基因组的功能注释覆盖范围。
BMC Genomics. 2015 Dec 24;16:1102. doi: 10.1186/s12864-015-2311-9.
3
Benchmarking gene ontology function predictions using negative annotations.利用负注释进行基因本体论功能预测的基准测试。
Bioinformatics. 2020 Jul 1;36(Suppl_1):i210-i218. doi: 10.1093/bioinformatics/btaa466.
4
Comparison of 19 Strains in Genomics, Phylogenetics, Phylogenomics and Functional Genomics.19种菌株在基因组学、系统发育学、系统基因组学和功能基因组学方面的比较
Front Cell Infect Microbiol. 2017 Feb 14;7:28. doi: 10.3389/fcimb.2017.00028. eCollection 2017.
5
Improved methods and resources for paramecium genomics: transcription units, gene annotation and gene expression.草履虫基因组学的改进方法与资源:转录单元、基因注释与基因表达
BMC Genomics. 2017 Jun 26;18(1):483. doi: 10.1186/s12864-017-3887-z.
6
CodingQuarry: highly accurate hidden Markov model gene prediction in fungal genomes using RNA-seq transcripts.CodingQuarry:利用RNA测序转录本对真菌基因组进行高精度隐马尔可夫模型基因预测。
BMC Genomics. 2015 Mar 11;16(1):170. doi: 10.1186/s12864-015-1344-4.
7
Comparative assessment of performance and genome dependence among phylogenetic profiling methods.系统发育谱分析方法之间性能和基因组依赖性的比较评估。
BMC Bioinformatics. 2006 Sep 27;7:420. doi: 10.1186/1471-2105-7-420.
8
OrthoFiller: utilising data from multiple species to improve the completeness of genome annotations.OrthoFiller:利用多个物种的数据提高基因组注释的完整性。
BMC Genomics. 2017 May 18;18(1):390. doi: 10.1186/s12864-017-3771-x.
9
Modeling central metabolism and energy biosynthesis across microbial life.模拟微生物生命过程中的中心代谢和能量生物合成。
BMC Genomics. 2016 Aug 8;17:568. doi: 10.1186/s12864-016-2887-8.
10
The quest for orthologs: finding the corresponding gene across genomes.寻找直系同源基因:在不同基因组中找到对应的基因。
Trends Genet. 2008 Nov;24(11):539-51. doi: 10.1016/j.tig.2008.08.009. Epub 2008 Sep 24.

引用本文的文献

1
ProTaxoVis-protein taxonomic visualisation of presence.ProTaxoVis——蛋白质分类存在情况的可视化
BMC Bioinformatics. 2025 May 19;26(1):128. doi: 10.1186/s12859-025-06146-9.
2
EvoWeaver: large-scale prediction of gene functional associations from coevolutionary signals.EvoWeaver:基于共进化信号的基因功能关联大规模预测
Nat Commun. 2025 Apr 24;16(1):3878. doi: 10.1038/s41467-025-59175-6.
3
Assembling bacterial puzzles: piecing together functions into microbial pathways.组装细菌谜题:将功能拼凑成微生物途径。

本文引用的文献

1
CAFA and the open world of protein function predictions.计算机辅助功能注释(CAFA)与蛋白质功能预测的开放世界
Trends Genet. 2013 Nov;29(11):609-10. doi: 10.1016/j.tig.2013.09.005. Epub 2013 Oct 15.
2
Shared protein complex subunits contribute to explaining disrupted co-occurrence.共享蛋白复合物亚基有助于解释失调的共现。
PLoS Comput Biol. 2013;9(7):e1003124. doi: 10.1371/journal.pcbi.1003124. Epub 2013 Jul 18.
3
Phyletic profiling with cliques of orthologs is enhanced by signatures of paralogy relationships.系统发生分析与直系同源基因聚类增强了旁系同源关系的特征。
NAR Genom Bioinform. 2024 Aug 24;6(3):lqae109. doi: 10.1093/nargab/lqae109. eCollection 2024 Sep.
4
Phylogenetic Profiling Analysis of the Phycobilisome Revealed a Novel State-Transition Regulator Gene in Synechocystis sp. PCC 6803.藻胆体系统发育分析揭示了 Synechocystis sp. PCC 6803 中一种新型的光形态建成转换调控基因。
Plant Cell Physiol. 2024 Oct 3;65(9):1450-1460. doi: 10.1093/pcp/pcae083.
5
A phylogenetic profiling approach identifies novel ciliogenesis genes in Drosophila and C. elegans.系统发育分析方法在果蝇和秀丽隐杆线虫中鉴定出新的纤毛发生基因。
EMBO J. 2023 Aug 15;42(16):e113616. doi: 10.15252/embj.2023113616. Epub 2023 Jun 15.
6
Overview of methods for characterization and visualization of a protein-protein interaction network in a multi-omics integration context.多组学整合背景下蛋白质-蛋白质相互作用网络的表征与可视化方法概述。
Front Mol Biosci. 2022 Sep 8;9:962799. doi: 10.3389/fmolb.2022.962799. eCollection 2022.
7
Phylogenetic profiling in eukaryotes: The effect of species, orthologous group, and interactome selection on protein interaction prediction.真核生物的系统发生分析:物种、直系同源群和相互作用组选择对蛋白质相互作用预测的影响。
PLoS One. 2022 Apr 14;17(4):e0251833. doi: 10.1371/journal.pone.0251833. eCollection 2022.
8
Identifying protein function and functional links based on large-scale co-occurrence patterns.基于大规模共现模式识别蛋白质功能和功能链接。
PLoS One. 2022 Mar 3;17(3):e0264765. doi: 10.1371/journal.pone.0264765. eCollection 2022.
9
Co-evolution based machine-learning for predicting functional interactions between human genes.基于共同进化的机器学习预测人类基因之间的功能相互作用。
Nat Commun. 2021 Nov 9;12(1):6454. doi: 10.1038/s41467-021-26792-w.
10
Review of Machine Learning Methods for the Prediction and Reconstruction of Metabolic Pathways.用于代谢途径预测与重建的机器学习方法综述
Front Mol Biosci. 2021 Jun 17;8:634141. doi: 10.3389/fmolb.2021.634141. eCollection 2021.
PLoS Comput Biol. 2013;9(1):e1002852. doi: 10.1371/journal.pcbi.1002852. Epub 2013 Jan 3.
4
Effect of reference genome selection on the performance of computational methods for genome-wide protein-protein interaction prediction.参考基因组选择对全基因组蛋白质相互作用预测计算方法性能的影响。
PLoS One. 2012;7(7):e42057. doi: 10.1371/journal.pone.0042057. Epub 2012 Jul 26.
5
Quality of computationally inferred gene ontology annotations.计算推断的基因本体论注释的质量。
PLoS Comput Biol. 2012 May;8(5):e1002533. doi: 10.1371/journal.pcbi.1002533. Epub 2012 May 31.
6
On the Use of Gene Ontology Annotations to Assess Functional Similarity among Orthologs and Paralogs: A Short Report.利用基因本体论注释评估直系同源物和旁系同源物之间的功能相似性:简短报告。
PLoS Comput Biol. 2012;8(2):e1002386. doi: 10.1371/journal.pcbi.1002386. Epub 2012 Feb 16.
7
Automatic selection of reference taxa for protein-protein interaction prediction with phylogenetic profiling.基于系统发育轮廓的蛋白质-蛋白质相互作用预测中参考分类群的自动选择。
Bioinformatics. 2012 Mar 15;28(6):851-7. doi: 10.1093/bioinformatics/btr720. Epub 2012 Jan 4.
8
The Genomes OnLine Database (GOLD) v.4: status of genomic and metagenomic projects and their associated metadata.《基因组在线数据库》(GOLD)v.4:基因组和宏基因组项目及其相关元数据的现状。
Nucleic Acids Res. 2012 Jan;40(Database issue):D571-9. doi: 10.1093/nar/gkr1100. Epub 2011 Dec 1.
9
The what, where, how and why of gene ontology--a primer for bioinformaticians.基因本体论的是什么、在哪里、如何以及为什么——生物信息学家入门。
Brief Bioinform. 2011 Nov;12(6):723-35. doi: 10.1093/bib/bbr002. Epub 2011 Feb 17.
10
Predicting gene function using hierarchical multi-label decision tree ensembles.基于层次多标签决策树集成模型预测基因功能。
BMC Bioinformatics. 2010 Jan 2;11:2. doi: 10.1186/1471-2105-11-2.