系统发生分析与直系同源基因聚类增强了旁系同源关系的特征。

Phyletic profiling with cliques of orthologs is enhanced by signatures of paralogy relationships.

机构信息

ETH Zurich, Computer Science, Zurich, Switzerland.

出版信息

PLoS Comput Biol. 2013;9(1):e1002852. doi: 10.1371/journal.pcbi.1002852. Epub 2013 Jan 3.

DOI:10.1371/journal.pcbi.1002852

PMID:23308060

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3536626/

Abstract

New microbial genomes are sequenced at a high pace, allowing insight into the genetics of not only cultured microbes, but a wide range of metagenomic collections such as the human microbiome. To understand the deluge of genomic data we face, computational approaches for gene functional annotation are invaluable. We introduce a novel model for computational annotation that refines two established concepts: annotation based on homology and annotation based on phyletic profiling. The phyletic profiling-based model that includes both inferred orthologs and paralogs-homologs separated by a speciation and a duplication event, respectively-provides more annotations at the same average Precision than the model that includes only inferred orthologs. For experimental validation, we selected 38 poorly annotated Escherichia coli genes for which the model assigned one of three GO terms with high confidence: involvement in DNA repair, protein translation, or cell wall synthesis. Results of antibiotic stress survival assays on E. coli knockout mutants showed high agreement with our model's estimates of accuracy: out of 38 predictions obtained at the reported Precision of 60%, we confirmed 25 predictions, indicating that our confidence estimates can be used to make informed decisions on experimental validation. Our work will contribute to making experimental validation of computational predictions more approachable, both in cost and time. Our predictions for 998 prokaryotic genomes include ~400000 specific annotations with the estimated Precision of 90%, ~19000 of which are highly specific-e.g. "penicillin binding," "tRNA aminoacylation for protein translation," or "pathogenesis"-and are freely available at http://gorbi.irb.hr/.

摘要

新的微生物基因组以高速测序，不仅使我们能够深入了解培养微生物的遗传学，还能广泛了解人类微生物组等宏基因组集。为了理解我们面临的基因组数据洪流，基于计算的基因功能注释方法是非常宝贵的。我们引入了一种新的计算注释模型，该模型改进了两个已建立的概念：基于同源性的注释和基于系统发育分布的注释。基于系统发育分布的模型包括推断的直系同源物和旁系同源物——分别由物种形成和复制事件分隔，与仅包括推断的直系同源物的模型相比，它提供了更多的注释，且平均精度相同。为了进行实验验证，我们选择了 38 个注释较差的大肠杆菌基因，该模型为这些基因分配了三个 GO 术语之一，置信度很高：涉及 DNA 修复、蛋白质翻译或细胞壁合成。对大肠杆菌敲除突变体进行抗生素应激存活实验的结果与我们模型的准确性估计高度一致：在报告的精度为 60%的 38 个预测中，我们验证了 25 个预测，表明我们的置信度估计可用于在实验验证方面做出明智的决策。我们的工作将有助于使计算预测的实验验证在成本和时间方面都更容易实现。我们对 998 个原核基因组的预测包括约 400000 个具有估计精度为 90%的特定注释，其中约 19000 个是高度特异性的，例如“青霉素结合”、“蛋白质翻译的 tRNA 氨酰化”或“发病机制”，并且可以在 http://gorbi.irb.hr/ 上免费获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4e40/3536626/8592025a2b8d/pcbi.1002852.g001.jpg

相似文献

Phyletic profiling with cliques of orthologs is enhanced by signatures of paralogy relationships.

PLoS Comput Biol. 2013;9(1):e1002852. doi: 10.1371/journal.pcbi.1002852. Epub 2013 Jan 3.

Gene fusions and gene duplications: relevance to genomic annotation and functional analysis.

BMC Genomics. 2005 Mar 9;6:33. doi: 10.1186/1471-2164-6-33.

Enrichment of Triticum aestivum gene annotations using ortholog cliques and gene ontologies in other plants.

BMC Genomics. 2015 Apr 15;16(1):299. doi: 10.1186/s12864-015-1496-2.

Orthologs, paralogs, and evolutionary genomics.

Annu Rev Genet. 2005;39:309-38. doi: 10.1146/annurev.genet.39.073003.114725.

Information theory applied to the sparse gene ontology annotation network to predict novel gene function.

Bioinformatics. 2007 Jul 1;23(13):i529-38. doi: 10.1093/bioinformatics/btm195.

PhylomeDB v3.0: an expanding repository of genome-wide collections of trees, alignments and phylogeny-based orthology and paralogy predictions.

Nucleic Acids Res. 2011 Jan;39(Database issue):D556-60. doi: 10.1093/nar/gkq1109. Epub 2010 Nov 12.

MetaPhOrs: orthology and paralogy predictions from multiple phylogenetic evidence using a consistency-based confidence score.

Nucleic Acids Res. 2011 Mar;39(5):e32. doi: 10.1093/nar/gkq953. Epub 2010 Dec 11.

BMX: a tool for computing bacterial phyletic composition from orthologous maps.

BMC Res Notes. 2015 Feb 24;8:51. doi: 10.1186/s13104-015-1017-z.

Automatic clustering of orthologs and in-paralogs from pairwise species comparisons.

J Mol Biol. 2001 Dec 14;314(5):1041-52. doi: 10.1006/jmbi.2000.5197.

引用本文的文献

Role of the LuxR solo, SdiA, in eavesdropping on foreign bacteria.

FEMS Microbiol Rev. 2025 Jan 14;49. doi: 10.1093/femsre/fuaf015.

Identification of new SdiA regulon members of , , and serovars Typhimurium and Typhi.

Microbiol Spectr. 2024 Oct 22;12(12):e0192924. doi: 10.1128/spectrum.01929-24.

Assembling bacterial puzzles: piecing together functions into microbial pathways.

NAR Genom Bioinform. 2024 Aug 24;6(3):lqae109. doi: 10.1093/nargab/lqae109. eCollection 2024 Sep.

Interrogation of RNA-protein interaction dynamics in bacterial growth.

Mol Syst Biol. 2024 May;20(5):573-589. doi: 10.1038/s44320-024-00031-y. Epub 2024 Mar 26.

FinO/ProQ-family proteins: an evolutionary perspective.

Biosci Rep. 2023 Mar 31;43(3). doi: 10.1042/BSR20220313.

The minimal meningococcal ProQ protein has an intrinsic capacity for structure-based global RNA recognition.

Nat Commun. 2020 Jun 4;11(1):2823. doi: 10.1038/s41467-020-16650-6.

Identifying orthologs with OMA: A primer.

F1000Res. 2020 Jan 17;9:27. doi: 10.12688/f1000research.21508.1. eCollection 2020.

Combining learning and constraints for genome-wide protein annotation.

BMC Bioinformatics. 2019 Jun 17;20(1):338. doi: 10.1186/s12859-019-2875-5.

Quantifying changes in the bacterial thiol redox proteome during host-pathogen interaction.

Redox Biol. 2019 Feb;21:101087. doi: 10.1016/j.redox.2018.101087. Epub 2018 Dec 19.

The evolutionary signal in metagenome phyletic profiles predicts many gene functions.

Microbiome. 2018 Jul 10;6(1):129. doi: 10.1186/s40168-018-0506-4.

本文引用的文献

Quality of computationally inferred gene ontology annotations.

PLoS Comput Biol. 2012 May;8(5):e1002533. doi: 10.1371/journal.pcbi.1002533. Epub 2012 May 31.

Resolving the ortholog conjecture: orthologs tend to be weakly, but significantly, more similar in function than paralogs.

PLoS Comput Biol. 2012;8(5):e1002514. doi: 10.1371/journal.pcbi.1002514. Epub 2012 May 17.

On the Use of Gene Ontology Annotations to Assess Functional Similarity among Orthologs and Paralogs: A Short Report.

PLoS Comput Biol. 2012;8(2):e1002386. doi: 10.1371/journal.pcbi.1002386. Epub 2012 Feb 16.

Evidence-based annotation of gene function in Shewanella oneidensis MR-1 using genome-wide fitness profiling across 121 conditions.

PLoS Genet. 2011 Nov;7(11):e1002385. doi: 10.1371/journal.pgen.1002385. Epub 2011 Nov 17.

Reorganizing the protein space at the Universal Protein Resource (UniProt).

Nucleic Acids Res. 2012 Jan;40(Database issue):D71-5. doi: 10.1093/nar/gkr981. Epub 2011 Nov 18.

EcoliWiki: a wiki-based community resource for Escherichia coli.

Nucleic Acids Res. 2012 Jan;40(Database issue):D1270-7. doi: 10.1093/nar/gkr880. Epub 2011 Nov 7.

Testing the ortholog conjecture with comparative functional genomic data from mammals.

PLoS Comput Biol. 2011 Jun;7(6):e1002073. doi: 10.1371/journal.pcbi.1002073. Epub 2011 Jun 9.

Enterotypes of the human gut microbiome.

Nature. 2011 May 12;473(7346):174-80. doi: 10.1038/nature09944. Epub 2011 Apr 20.

COMBREX: COMputational BRidge to EXperiments.

Biochem Soc Trans. 2011 Apr;39(2):581-3. doi: 10.1042/BST0390581.

Phenotypic landscape of a bacterial cell.

Cell. 2011 Jan 7;144(1):143-56. doi: 10.1016/j.cell.2010.11.052. Epub 2010 Dec 23.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

系统发生分析与直系同源基因聚类增强了旁系同源关系的特征。

Phyletic profiling with cliques of orthologs is enhanced by signatures of paralogy relationships.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献