图谱锐化加图谱整合：一种改善蛋白质功能分类的协同作用。

Graph sharpening plus graph integration: a synergy that improves protein functional classification.

作者信息

Shin Hyunjung, Lisewski Andreas Martin, Lichtarge Olivier

机构信息

Department of Industrial & Information Systems Engineering, Ajou University, San 5, Wonchun-dong, Yeoungtong-gu, 443-749, Suwon, Korea.

出版信息

Bioinformatics. 2007 Dec 1;23(23):3217-24. doi: 10.1093/bioinformatics/btm511. Epub 2007 Oct 31.

DOI:10.1093/bioinformatics/btm511

PMID:17977886

Abstract

MOTIVATION

Predicting protein function is a central problem in bioinformatics, and many approaches use partially or fully automated methods based on various combination of sequence, structure and other information on proteins or genes. Such information establishes relationships between proteins that can be modelled most naturally as edges in graphs. A priori, however, it is often unclear which edges from which graph may contribute most to accurate predictions. For that reason, one established strategy is to integrate all available sources, or graphs as in graph integration, in the hope that the positive signals will add to each other. However, in the problem of functional prediction, noise, i.e. the presence of inaccurate or false edges, can still be large enough that integration alone has little effect on prediction accuracy. In order to reduce noise levels and to improve integration efficiency, we present here a recent method in graph-based learning, graph sharpening, which provides a theoretically firm yet intuitive and practical approach for disconnecting undesirable edges from protein similarity graphs. This approach has several attractive features: it is quick, scalable in the number of proteins, robust with respect to errors and tolerant of very diverse types of protein similarity measures.

RESULTS

We tested the classification accuracy in a test set of 599 proteins with remote sequence homology spread over 20 Gene Ontology (GO) functional classes. When compared to integration alone, graph sharpening plus integration of four vastly different molecular similarity measures improved the overall classification by nearly 30% [0.17 average increase in the area under the ROC curve (AUC)]. Moreover, and partially through the increased sparsity of the graphs induced by sharpening, this gain in accuracy came at negligible computational cost: sharpening and integration took on average 4.66 (+/-4.44) CPU seconds.

AVAILABILITY

Software and Supplementary data will be available on http://mammoth.bcm.tmc.edu/

摘要

动机

预测蛋白质功能是生物信息学中的核心问题，许多方法使用基于蛋白质或基因的序列、结构及其他信息的各种组合的部分或完全自动化方法。此类信息建立了蛋白质之间的关系，这些关系可以最自然地建模为图中的边。然而，先验地，通常不清楚来自哪个图的哪些边可能对准确预测贡献最大。因此，一种既定策略是整合所有可用来源，或如图谱整合那样整合图谱，希望正信号能够相互叠加。然而，在功能预测问题中，噪声，即不准确或错误边的存在，可能仍然足够大，以至于仅靠整合对预测准确性几乎没有影响。为了降低噪声水平并提高整合效率，我们在此介绍一种基于图谱学习的最新方法——图谱锐化，它为从蛋白质相似性图谱中分离不良边提供了一种理论上坚实且直观实用的方法。这种方法具有几个吸引人的特点：它速度快、在蛋白质数量上可扩展、对错误具有鲁棒性并且能容忍非常多样的蛋白质相似性度量类型。

结果

我们在一个包含599个具有远距离序列同源性的蛋白质的测试集中测试了分类准确性，这些蛋白质分布在20个基因本体（GO）功能类别中。与仅进行整合相比，图谱锐化加上四种差异极大的分子相似性度量的整合将整体分类提高了近30%[ROC曲线下面积（AUC）平均增加0.17]。此外，部分通过锐化诱导的图谱稀疏性增加，这种准确性的提高是以可忽略不计的计算成本实现的：锐化和整合平均耗时4.66（±4.44）CPU秒。

可用性

软件和补充数据可在http://mammoth.bcm.tmc.edu/获取

相似文献

Graph sharpening plus graph integration: a synergy that improves protein functional classification.

Bioinformatics. 2007 Dec 1;23(23):3217-24. doi: 10.1093/bioinformatics/btm511. Epub 2007 Oct 31.

Annotating proteins by mining protein interaction networks.

Bioinformatics. 2006 Jul 15;22(14):e260-70. doi: 10.1093/bioinformatics/btl221.

Protein structural similarity search by Ramachandran codes.

BMC Bioinformatics. 2007 Aug 23;8:307. doi: 10.1186/1471-2105-8-307.

SherLoc: high-accuracy prediction of protein subcellular localization by integrating text and protein sequence data.

Bioinformatics. 2007 Jun 1;23(11):1410-7. doi: 10.1093/bioinformatics/btm115. Epub 2007 Mar 28.

J Biomed Inform. 2008 Feb;41(1):65-81. doi: 10.1016/j.jbi.2007.05.010. Epub 2007 Jun 27.

On the quality of tree-based protein classification.

Bioinformatics. 2005 May 1;21(9):1876-90. doi: 10.1093/bioinformatics/bti244. Epub 2005 Jan 12.

Context-sensitive data integration and prediction of biological networks.

Bioinformatics. 2007 Sep 1;23(17):2322-30. doi: 10.1093/bioinformatics/btm332. Epub 2007 Jun 28.

The global trace graph, a novel paradigm for searching protein sequence databases.

Bioinformatics. 2007 Sep 15;23(18):2361-7. doi: 10.1093/bioinformatics/btm358. Epub 2007 Sep 6.

Blast sampling for structural and functional analyses.

BMC Bioinformatics. 2007 Feb 23;8:62. doi: 10.1186/1471-2105-8-62.

Protein function prediction via graph kernels.

Bioinformatics. 2005 Jun;21 Suppl 1:i47-56. doi: 10.1093/bioinformatics/bti1007.

引用本文的文献

MoNETA: MultiOmics Network Embedding for SubType Analysis.

NAR Genom Bioinform. 2024 Oct 16;6(4):lqae141. doi: 10.1093/nargab/lqae141. eCollection 2024 Sep.

A Machine Learning-Based Approach Using Multi-omics Data to Predict Metabolic Pathways.

Methods Mol Biol. 2023;2553:441-452. doi: 10.1007/978-1-0716-2617-7_19.

Machine learning: its challenges and opportunities in plant system biology.

Appl Microbiol Biotechnol. 2022 May;106(9-10):3507-3530. doi: 10.1007/s00253-022-11963-6. Epub 2022 May 16.

Multiple Profile Models Extract Features from Protein Sequence Data and Resolve Functional Diversity of Very Different Protein Families.

Mol Biol Evol. 2022 Apr 10;39(4). doi: 10.1093/molbev/msac070.

A general calculus of fitness landscapes finds genes under selection in cancers.

Genome Res. 2022 May;32(5):916-929. doi: 10.1101/gr.275811.121. Epub 2022 Mar 17.

Integration of anatomy ontology data with protein-protein interaction networks improves the candidate gene prediction accuracy for anatomical entities.

BMC Bioinformatics. 2020 Oct 7;21(1):442. doi: 10.1186/s12859-020-03773-2.

The translational network for metabolic disease - from protein interaction to disease co-occurrence.

BMC Bioinformatics. 2019 Nov 13;20(1):576. doi: 10.1186/s12859-019-3106-9.

Drug repurposing with network reinforcement.

BMC Bioinformatics. 2019 Jul 24;20(Suppl 13):383. doi: 10.1186/s12859-019-2858-6.

A comparison of graph- and kernel-based -omics data integration algorithms for classifying complex traits.

BMC Bioinformatics. 2017 Dec 6;18(1):539. doi: 10.1186/s12859-017-1982-4.

An inference method from multi-layered structure of biomedical data.

BMC Med Inform Decis Mak. 2017 May 18;17(Suppl 1):52. doi: 10.1186/s12911-017-0450-4.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

图谱锐化加图谱整合：一种改善蛋白质功能分类的协同作用。

Graph sharpening plus graph integration: a synergy that improves protein functional classification.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY

动机

结果

可用性

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献