结核分枝杆菌和艰难梭菌相互作用组：细菌相互作用组预测计算系统快速发展的证明。

Ananthasubramanian Seshan, Metri Rahul, Khetan Ankur, Gupta Aman, Handen Adam, Chandra Nagasuma, Ganapathiraju Madhavi

Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh 15260, USA.

Intelligent Systems Program, University of Pittsburgh, Pittsburgh 15260, USA.

Microb Inform Exp. 2012 Mar 21;2:4. doi: 10.1186/2042-5783-2-4.

BACKGROUND

Protein-protein interaction (PPI) networks (interactomes) of most organisms, except for some model organisms, are largely unknown. Experimental methods including high-throughput techniques are highly resource intensive. Therefore, computational discovery of PPIs can accelerate biological discovery by presenting "most-promising" pairs of proteins that are likely to interact. For many bacteria, genome sequence, and thereby genomic context of proteomes, is readily available; additionally, for some of these proteomes, localization and functional annotations are also available, but interactomes are not available. We present here a method for rapid development of computational system to predict interactome of bacterial proteomes. While other studies have presented methods to transfer interologs across species, here, we propose transfer of computational models to benefit from cross-species annotations, thereby predicting many more novel interactions even in the absence of interologs. Mycobacterium tuberculosis (Mtb) and Clostridium difficile (CD) have been used to demonstrate the work.

RESULTS

We developed a random forest classifier over features derived from Gene Ontology annotations and genetic context scores provided by STRING database for predicting Mtb and CD interactions independently. The Mtb classifier gave a precision of 94% and a recall of 23% on a held out test set. The Mtb model was then run on all the 8 million protein pairs of the Mtb proteome, resulting in 708 new interactions (at 94% expected precision) or 1,595 new interactions at 80% expected precision. The CD classifier gave a precision of 90% and a recall of 16% on a held out test set. The CD model was run on all the 8 million protein pairs of the CD proteome, resulting in 143 new interactions (at 90% expected precision) or 580 new interactions (at 80% expected precision). We also compared the overlap of predictions of our method with STRING database interactions for CD and Mtb and also with interactions identified recently by a bacterial 2-hybrid system for Mtb. To demonstrate the utility of transfer of computational models, we made use of the developed Mtb model and used it to predict CD protein-pairs. The cross species model thus developed yielded a precision of 88% at a recall of 8%. To demonstrate transfer of features from other organisms in the absence of feature-based and interaction-based information, we transferred missing feature values from Mtb orthologs into the CD data. In transferring this data from orthologs (not interologs), we showed that a large number of interactions can be predicted.

CONCLUSIONS

Rapid discovery of (partial) bacterial interactome can be made by using existing set of GO and STRING features associated with the organisms. We can make use of cross-species interactome development, when there are not even sufficient known interactions to develop a computational prediction system. Computational model of well-studied organism(s) can be employed to make the initial interactome prediction for the target organism. We have also demonstrated successfully, that annotations can be transferred from orthologs in well-studied organisms enabling accurate predictions for organisms with no annotations. These approaches can serve as building blocks to address the challenges associated with feature coverage, missing interactions towards rapid interactome discovery for bacterial organisms.

AVAILABILITY

The predictions for all Mtb and CD proteins are made available at: http://severus.dbmi.pitt.edu/TB and http://severus.dbmi.pitt.edu/CD respectively for browsing as well as for download.

背景

除了一些模式生物外，大多数生物的蛋白质-蛋白质相互作用（PPI）网络（相互作用组）很大程度上未知。包括高通量技术在内的实验方法资源消耗极大。因此，通过呈现可能相互作用的“最具潜力”蛋白质对，PPI的计算发现可以加速生物学发现。对于许多细菌来说，基因组序列以及蛋白质组的基因组背景很容易获得；此外，对于其中一些蛋白质组，定位和功能注释也可获得，但相互作用组不可用。我们在此提出一种快速开发计算系统以预测细菌蛋白质组相互作用组的方法。虽然其他研究提出了跨物种转移互作同源物的方法，但在此我们提议转移计算模型以受益于跨物种注释，从而即使在没有互作同源物的情况下也能预测更多新的相互作用。我们已使用结核分枝杆菌（Mtb）和艰难梭菌（CD）来演示该工作。

结果

我们基于从基因本体注释和STRING数据库提供的遗传背景分数导出的特征开发了一个随机森林分类器，用于独立预测Mtb和CD的相互作用。Mtb分类器在一个留出的测试集上的精确率为94%，召回率为23%。然后将Mtb模型应用于Mtb蛋白质组的所有800万对蛋白质，得到708个新的相互作用（预期精确率为94%时）或1595个新的相互作用（预期精确率为80%时）。CD分类器在一个留出的测试集上的精确率为90%，召回率为16%。将CD模型应用于CD蛋白质组的所有800万对蛋白质，得到143个新的相互作用（预期精确率为90%时）或580个新的相互作用（预期精确率为80%时）。我们还比较了我们方法的预测结果与STRING数据库中CD和Mtb的相互作用以及最近通过细菌双杂交系统鉴定的Mtb相互作用的重叠情况。为了证明计算模型转移的实用性，我们利用开发的Mtb模型并将其用于预测CD蛋白质对。由此开发出的跨物种模型在召回率为8%时精确率为88%。为了在缺乏基于特征和基于相互作用的信息时证明从其他生物体转移特征，我们将Mtb直系同源物中缺失的特征值转移到CD数据中。在从直系同源物（而非互作同源物）转移此数据时，我们表明可以预测大量的相互作用。

结论

通过使用与生物体相关的现有基因本体和STRING特征集，可以快速发现（部分）细菌相互作用组。当甚至没有足够的已知相互作用来开发计算预测系统时，我们可以利用跨物种相互作用组开发。可以采用研究充分的生物体的计算模型来对目标生物体进行初始相互作用组预测。我们还成功证明，可以从研究充分的生物体的直系同源物转移注释，从而对没有注释的数据进行准确预测。这些方法可以作为解决与特征覆盖、缺失相互作用相关的挑战的基石，以实现细菌生物体相互作用组的快速发现。

可用性

所有Mtb和CD蛋白质的预测结果分别可在以下网址获取：http://severus.dbmi.pitt.edu/TB和http://severus.dbmi.pitt.edu/CD，用于浏览和下载。

相似文献

Mycobacterium tuberculosis and Clostridium difficille interactomes: demonstration of rapid development of computational system for bacterial interactome prediction.

Microb Inform Exp. 2012 Mar 21;2:4. doi: 10.1186/2042-5783-2-4.

Comparative analysis and assessment of M. tuberculosis H37Rv protein-protein interaction datasets.

BMC Genomics. 2011 Nov 30;12 Suppl 3(Suppl 3):S20. doi: 10.1186/1471-2164-12-S3-S20.

Bacterial protein meta-interactomes predict cross-species interactions and protein function.

BMC Bioinformatics. 2017 Mar 16;18(1):171. doi: 10.1186/s12859-017-1585-0.

Complementing the Eukaryotic Protein Interactome.

PLoS One. 2013 Jun 18;8(6):e66635. doi: 10.1371/journal.pone.0066635. Print 2013.

Predicting gene ontology annotations of orphan GWAS genes using protein-protein interactions.

Algorithms Mol Biol. 2014 Apr 3;9(1):10. doi: 10.1186/1748-7188-9-10.

Predicting whole genome protein interaction networks from primary sequence data in model and non-model organisms using ENTS.

BMC Genomics. 2013 Sep 10;14:608. doi: 10.1186/1471-2164-14-608.

Schizophrenia interactome with 504 novel protein-protein interactions.

NPJ Schizophr. 2016 Apr 27;2:16012. doi: 10.1038/npjschz.2016.12. eCollection 2016.

Using structural knowledge in the protein data bank to inform the search for potential host-microbe protein interactions in sequence space: application to Mycobacterium tuberculosis.

BMC Bioinformatics. 2017 Apr 4;18(1):201. doi: 10.1186/s12859-017-1550-y.

Improving the Detection of Protein Complexes by Predicting Novel Missing Interactome Links in the Protein-Protein Interaction Network.

Annu Int Conf IEEE Eng Med Biol Soc. 2018 Jul;2018:5041-5044. doi: 10.1109/EMBC.2018.8513476.

引用本文的文献

Computational Network Inference for Bacterial Interactomics.

mSystems. 2022 Apr 26;7(2):e0145621. doi: 10.1128/msystems.01456-21. Epub 2022 Mar 30.

Identifying Protein Complexes With Clear Module Structure Using Pairwise Constraints in Protein Interaction Networks.

Front Genet. 2021 Aug 27;12:664786. doi: 10.3389/fgene.2021.664786. eCollection 2021.

Transferring knowledge of bacterial protein interaction networks to predict pathogen targeted human genes and immune signaling pathways: a case study on M. tuberculosis.

BMC Genomics. 2018 Jun 28;19(1):505. doi: 10.1186/s12864-018-4873-9.

Comparative genomics of 274 Vibrio cholerae genomes reveals mobile functions structuring three niche dimensions.

BMC Genomics. 2014 Aug 5;15(1):654. doi: 10.1186/1471-2164-15-654.

Research prioritization through prediction of future impact on biomedical science: a position paper on inference-analytics.

Gigascience. 2013 Aug 30;2(1):11. doi: 10.1186/2047-217X-2-11.

本文引用的文献

Immunological biomarkers of tuberculosis.

Nat Rev Immunol. 2011 May;11(5):343-54. doi: 10.1038/nri2960. Epub 2011 Apr 8.

Systems biology of tuberculosis.

Tuberculosis (Edinb). 2011 Sep;91(5):487-96. doi: 10.1016/j.tube.2011.02.008. Epub 2011 Apr 1.

Clostridium difficile infection: An overview of the disease and its pathogenesis, epidemiology and interventions.

Gut Microbes. 2010 Jul;1(4):234-242. doi: 10.4161/gmic.1.4.12706. Epub 2010 Jun 16.

New insights into protein-protein interaction data lead to increased estimates of the S. cerevisiae interactome size.

BMC Bioinformatics. 2010 Dec 21;11:605. doi: 10.1186/1471-2105-11-605.

The BioGRID Interaction Database: 2011 update.

Nucleic Acids Res. 2011 Jan;39(Database issue):D698-704. doi: 10.1093/nar/gkq1116. Epub 2010 Nov 11.

The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored.

Nucleic Acids Res. 2011 Jan;39(Database issue):D561-8. doi: 10.1093/nar/gkq973. Epub 2010 Nov 2.

Global protein-protein interaction network in the human pathogen Mycobacterium tuberculosis H37Rv.

J Proteome Res. 2010 Dec 3;9(12):6665-77. doi: 10.1021/pr100808n. Epub 2010 Nov 10.

A systems perspective of host-pathogen interactions: predicting disease outcome in tuberculosis.

Mol Biosyst. 2010 Mar;6(3):516-30. doi: 10.1039/b912129c. Epub 2009 Dec 14.

Progress and challenges in predicting protein-protein interaction sites.

Brief Bioinform. 2009 May;10(3):233-46. doi: 10.1093/bib/bbp021. Epub 2009 Apr 3.

Prediction of interactions between HIV-1 and human proteins by information integration.

Pac Symp Biocomput. 2009:516-27.

Suppr 超能文献

核心技术专利：CN118964589B侵权必究

相似文献

Mycobacterium tuberculosis and Clostridium difficille interactomes: demonstration of rapid development of computational system for bacterial interactome prediction.

Microb Inform Exp. 2012 Mar 21;2:4. doi: 10.1186/2042-5783-2-4.

Comparative analysis and assessment of M. tuberculosis H37Rv protein-protein interaction datasets.

BMC Genomics. 2011 Nov 30;12 Suppl 3(Suppl 3):S20. doi: 10.1186/1471-2164-12-S3-S20.

Bacterial protein meta-interactomes predict cross-species interactions and protein function.

BMC Bioinformatics. 2017 Mar 16;18(1):171. doi: 10.1186/s12859-017-1585-0.

Complementing the Eukaryotic Protein Interactome.

PLoS One. 2013 Jun 18;8(6):e66635. doi: 10.1371/journal.pone.0066635. Print 2013.

Predicting gene ontology annotations of orphan GWAS genes using protein-protein interactions.

Algorithms Mol Biol. 2014 Apr 3;9(1):10. doi: 10.1186/1748-7188-9-10.

Predicting whole genome protein interaction networks from primary sequence data in model and non-model organisms using ENTS.

BMC Genomics. 2013 Sep 10;14:608. doi: 10.1186/1471-2164-14-608.

Schizophrenia interactome with 504 novel protein-protein interactions.

NPJ Schizophr. 2016 Apr 27;2:16012. doi: 10.1038/npjschz.2016.12. eCollection 2016.

Using structural knowledge in the protein data bank to inform the search for potential host-microbe protein interactions in sequence space: application to Mycobacterium tuberculosis.

BMC Bioinformatics. 2017 Apr 4;18(1):201. doi: 10.1186/s12859-017-1550-y.

Improving the Detection of Protein Complexes by Predicting Novel Missing Interactome Links in the Protein-Protein Interaction Network.

Annu Int Conf IEEE Eng Med Biol Soc. 2018 Jul;2018:5041-5044. doi: 10.1109/EMBC.2018.8513476.

引用本文的文献

Computational Network Inference for Bacterial Interactomics.

mSystems. 2022 Apr 26;7(2):e0145621. doi: 10.1128/msystems.01456-21. Epub 2022 Mar 30.

Identifying Protein Complexes With Clear Module Structure Using Pairwise Constraints in Protein Interaction Networks.

Front Genet. 2021 Aug 27;12:664786. doi: 10.3389/fgene.2021.664786. eCollection 2021.

Transferring knowledge of bacterial protein interaction networks to predict pathogen targeted human genes and immune signaling pathways: a case study on M. tuberculosis.

BMC Genomics. 2018 Jun 28;19(1):505. doi: 10.1186/s12864-018-4873-9.

Comparative genomics of 274 Vibrio cholerae genomes reveals mobile functions structuring three niche dimensions.

BMC Genomics. 2014 Aug 5;15(1):654. doi: 10.1186/1471-2164-15-654.

Research prioritization through prediction of future impact on biomedical science: a position paper on inference-analytics.

Gigascience. 2013 Aug 30;2(1):11. doi: 10.1186/2047-217X-2-11.

本文引用的文献

Immunological biomarkers of tuberculosis.

Nat Rev Immunol. 2011 May;11(5):343-54. doi: 10.1038/nri2960. Epub 2011 Apr 8.

Systems biology of tuberculosis.

Tuberculosis (Edinb). 2011 Sep;91(5):487-96. doi: 10.1016/j.tube.2011.02.008. Epub 2011 Apr 1.

Clostridium difficile infection: An overview of the disease and its pathogenesis, epidemiology and interventions.

Gut Microbes. 2010 Jul;1(4):234-242. doi: 10.4161/gmic.1.4.12706. Epub 2010 Jun 16.

New insights into protein-protein interaction data lead to increased estimates of the S. cerevisiae interactome size.

BMC Bioinformatics. 2010 Dec 21;11:605. doi: 10.1186/1471-2105-11-605.

The BioGRID Interaction Database: 2011 update.

Nucleic Acids Res. 2011 Jan;39(Database issue):D698-704. doi: 10.1093/nar/gkq1116. Epub 2010 Nov 11.

The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored.

Nucleic Acids Res. 2011 Jan;39(Database issue):D561-8. doi: 10.1093/nar/gkq973. Epub 2010 Nov 2.

Global protein-protein interaction network in the human pathogen Mycobacterium tuberculosis H37Rv.

J Proteome Res. 2010 Dec 3;9(12):6665-77. doi: 10.1021/pr100808n. Epub 2010 Nov 10.

A systems perspective of host-pathogen interactions: predicting disease outcome in tuberculosis.

Mol Biosyst. 2010 Mar;6(3):516-30. doi: 10.1039/b912129c. Epub 2009 Dec 14.

Progress and challenges in predicting protein-protein interaction sites.

Brief Bioinform. 2009 May;10(3):233-46. doi: 10.1093/bib/bbp021. Epub 2009 Apr 3.

Prediction of interactions between HIV-1 and human proteins by information integration.

Pac Symp Biocomput. 2009:516-27.

Mycobacterium tuberculosis and Clostridium difficille interactomes: demonstration of rapid development of computational system for bacterial interactome prediction.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

AVAILABILITY

背景

结果

结论

可用性

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献