多层次学习：通过允许信息在各层次之间流动来改进蛋白质、结构域和残基相互作用的预测。

Multi-level learning: improving the prediction of protein, domain and residue interactions by allowing information flow between levels.

作者信息

Yip Kevin Y, Kim Philip M, McDermott Drew, Gerstein Mark

机构信息

Department of Computer Science, Yale University, 51 Prospect Street, New Haven, CT 06511, USA.

出版信息

BMC Bioinformatics. 2009 Aug 5;10:241. doi: 10.1186/1471-2105-10-241.

DOI:10.1186/1471-2105-10-241

PMID:19656385

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2734556/

Abstract

BACKGROUND

Proteins interact through specific binding interfaces that contain many residues in domains. Protein interactions thus occur on three different levels of a concept hierarchy: whole-proteins, domains, and residues. Each level offers a distinct and complementary set of features for computationally predicting interactions, including functional genomic features of whole proteins, evolutionary features of domain families and physical-chemical features of individual residues. The predictions at each level could benefit from using the features at all three levels. However, it is not trivial as the features are provided at different granularity.

RESULTS

To link up the predictions at the three levels, we propose a multi-level machine-learning framework that allows for explicit information flow between the levels. We demonstrate, using representative yeast interaction networks, that our algorithm is able to utilize complementary feature sets to make more accurate predictions at the three levels than when the three problems are approached independently. To facilitate application of our multi-level learning framework, we discuss three key aspects of multi-level learning and the corresponding design choices that we have made in the implementation of a concrete learning algorithm. 1) Architecture of information flow: we show the greater flexibility of bidirectional flow over independent levels and unidirectional flow; 2) Coupling mechanism of the different levels: We show how this can be accomplished via augmenting the training sets at each level, and discuss the prevention of error propagation between different levels by means of soft coupling; 3) Sparseness of data: We show that the multi-level framework compounds data sparsity issues, and discuss how this can be dealt with by building local models in information-rich parts of the data. Our proof-of-concept learning algorithm demonstrates the advantage of combining levels, and opens up opportunities for further research.

AVAILABILITY

The software and a readme file can be downloaded at http://networks.gersteinlab.org/mll. The programs are written in Java, and can be run on any platform with Java 1.4 or higher and Apache Ant 1.7.0 or higher installed. The software can be used without a license.

摘要

背景

蛋白质通过特定的结合界面相互作用，这些界面在结构域中包含许多残基。因此，蛋白质相互作用发生在概念层次结构的三个不同层面：全蛋白质、结构域和残基。每个层面都为计算预测相互作用提供了一组独特且互补的特征，包括全蛋白质的功能基因组特征、结构域家族的进化特征以及单个残基的物理化学特征。每个层面的预测都可以从使用所有三个层面的特征中受益。然而，这并非易事，因为这些特征是以不同的粒度提供的。

结果

为了将三个层面的预测联系起来，我们提出了一个多层次机器学习框架，该框架允许各层面之间进行明确的信息流。我们使用具有代表性的酵母相互作用网络证明，与独立处理这三个问题相比，我们的算法能够利用互补特征集在三个层面做出更准确的预测。为了便于应用我们的多层次学习框架，我们讨论了多层次学习的三个关键方面以及我们在具体学习算法实现中所做的相应设计选择。1）信息流架构：我们展示了双向流相对于独立层面和单向流具有更大的灵活性；2）不同层面的耦合机制：我们展示了如何通过扩充每个层面的训练集来实现这一点，并讨论了通过软耦合防止不同层面之间的错误传播；3）数据稀疏性：我们表明多层次框架加剧了数据稀疏性问题，并讨论了如何通过在数据丰富的部分构建局部模型来处理这一问题。我们的概念验证学习算法展示了整合各层面的优势，并为进一步研究开辟了机会。

可用性

软件和自述文件可从http://networks.gersteinlab.org/mll下载。这些程序用Java编写，可以在安装了Java 1.4或更高版本以及Apache Ant 1.7.0或更高版本的任何平台上运行。该软件无需许可证即可使用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/538d/2734556/cd0a1a545c2f/1471-2105-10-241-1.jpg

相似文献

Multi-level learning: improving the prediction of protein, domain and residue interactions by allowing information flow between levels.多层次学习：通过允许信息在各层次之间流动来改进蛋白质、结构域和残基相互作用的预测。

BMC Bioinformatics. 2009 Aug 5;10:241. doi: 10.1186/1471-2105-10-241.

Improved multi-level protein-protein interaction prediction with semantic-based regularization.基于语义正则化的改进型多层次蛋白质-蛋白质相互作用预测。

BMC Bioinformatics. 2014 Apr 12;15:103. doi: 10.1186/1471-2105-15-103.

Accurate prediction of interfacial residues in two-domain proteins using evolutionary information: implications for three-dimensional modeling.利用进化信息准确预测双结构域蛋白质中的界面残基：对三维建模的启示

Proteins. 2014 Jul;82(7):1219-34. doi: 10.1002/prot.24486. Epub 2013 Dec 6.

Hum-mPLoc 3.0: prediction enhancement of human protein subcellular localization through modeling the hidden correlations of gene ontology and functional domain features.Hum-mPLoc 3.0：通过对基因本体和功能域特征的隐藏相关性进行建模来增强人类蛋白质亚细胞定位预测

Bioinformatics. 2017 Mar 15;33(6):843-853. doi: 10.1093/bioinformatics/btw723.

AVID: an integrative framework for discovering functional relationships among proteins.AVID：一个用于发现蛋白质间功能关系的综合框架。

BMC Bioinformatics. 2005 Jun 1;6:136. doi: 10.1186/1471-2105-6-136.

AIDA: ab initio domain assembly for automated multi-domain protein structure prediction and domain-domain interaction prediction.AIDA：用于自动多结构域蛋白质结构预测和结构域-结构域相互作用预测的从头结构域组装

Bioinformatics. 2015 Jul 1;31(13):2098-105. doi: 10.1093/bioinformatics/btv092. Epub 2015 Feb 19.

A lock-and-key model for protein-protein interactions.蛋白质-蛋白质相互作用的锁钥模型。

Bioinformatics. 2006 Aug 15;22(16):2012-9. doi: 10.1093/bioinformatics/btl338. Epub 2006 Jun 20.

Structure-based prediction of protein- peptide binding regions using Random Forest.基于结构的随机森林预测蛋白肽结合区域。

Bioinformatics. 2018 Feb 1;34(3):477-484. doi: 10.1093/bioinformatics/btx614.

Topology of functional networks predicts physical binding of proteins.功能网络的拓扑结构预测蛋白质的物理结合。

Bioinformatics. 2012 Aug 15;28(16):2137-45. doi: 10.1093/bioinformatics/bts351. Epub 2012 Jun 19.

CRF-based models of protein surfaces improve protein-protein interaction site predictions.基于 CRF 的蛋白质表面模型可提高蛋白质-蛋白质相互作用位点预测。

BMC Bioinformatics. 2014 Aug 13;15(1):277. doi: 10.1186/1471-2105-15-277.

引用本文的文献

Combining learning and constraints for genome-wide protein annotation.联合学习与约束进行全基因组蛋白注释。

BMC Bioinformatics. 2019 Jun 17;20(1):338. doi: 10.1186/s12859-019-2875-5.

Multi-level machine learning prediction of protein-protein interactions in Saccharomyces cerevisiae.酿酒酵母中蛋白质-蛋白质相互作用的多层次机器学习预测

PeerJ. 2015 Jul 2;3:e1041. doi: 10.7717/peerj.1041. eCollection 2015.

Improved multi-level protein-protein interaction prediction with semantic-based regularization.基于语义正则化的改进型多层次蛋白质-蛋白质相互作用预测。

BMC Bioinformatics. 2014 Apr 12;15:103. doi: 10.1186/1471-2105-15-103.

Joint probabilistic-logical refinement of multiple protein feature predictors.联合概率逻辑对多个蛋白质特征预测器进行细化。

BMC Bioinformatics. 2014 Jan 15;15:16. doi: 10.1186/1471-2105-15-16.

Reconstituting protein interaction networks using parameter-dependent domain-domain interactions.使用依赖参数的域-域相互作用重建蛋白质相互作用网络。

BMC Bioinformatics. 2013 May 7;14:154. doi: 10.1186/1471-2105-14-154.

HomPPI: a class of sequence homology based protein-protein interface prediction methods.HomPPI：一类基于序列同源性的蛋白质-蛋白质界面预测方法。

BMC Bioinformatics. 2011 Jun 17;12:244. doi: 10.1186/1471-2105-12-244.

本文引用的文献

An integrative approach for predicting interactions of protein regions.一种预测蛋白质区域相互作用的综合方法。

Bioinformatics. 2008 Aug 15;24(16):i35-41. doi: 10.1093/bioinformatics/btn290.

Message-passing algorithms for the prediction of protein domain interactions from protein-protein interaction data.基于蛋白质-蛋白质相互作用数据预测蛋白质结构域相互作用的消息传递算法。

Bioinformatics. 2008 Sep 15;24(18):2064-70. doi: 10.1093/bioinformatics/btn366. Epub 2008 Jul 17.

Genome-wide inference of protein interaction sites: lessons from the yeast high-quality negative protein-protein interaction dataset.蛋白质相互作用位点的全基因组推断：来自酵母高质量负蛋白质-蛋白质相互作用数据集的经验教训。

Nucleic Acids Res. 2008 Apr;36(6):2002-11. doi: 10.1093/nar/gkn016. Epub 2008 Feb 14.

The Pfam protein families database.Pfam蛋白质家族数据库。

Nucleic Acids Res. 2008 Jan;36(Database issue):D281-8. doi: 10.1093/nar/gkm960. Epub 2007 Nov 26.

InSite: a computational method for identifying protein-protein interaction binding sites on a proteome-wide scale.InSite：一种在全蛋白质组范围内识别蛋白质-蛋白质相互作用结合位点的计算方法。

Genome Biol. 2007;8(9):R192. doi: 10.1186/gb-2007-8-9-r192.

Supervised reconstruction of biological networks with local models.基于局部模型的生物网络监督重建

Bioinformatics. 2007 Jul 1;23(13):i57-65. doi: 10.1093/bioinformatics/btm204.

High-throughput identification of interacting protein-protein binding sites.相互作用蛋白质-蛋白质结合位点的高通量鉴定

BMC Bioinformatics. 2007 Jun 27;8:223. doi: 10.1186/1471-2105-8-223.

Structures in systems biology.系统生物学中的结构

Curr Opin Struct Biol. 2007 Jun;17(3):378-84. doi: 10.1016/j.sbi.2007.05.005. Epub 2007 Jun 15.

Relating three-dimensional structures to protein networks provides evolutionary insights.将三维结构与蛋白质网络联系起来可提供进化方面的见解。

Science. 2006 Dec 22;314(5807):1938-41. doi: 10.1126/science.1136174.

Predicting domain-domain interactions using a parsimony approach.使用简约方法预测结构域-结构域相互作用。

Genome Biol. 2006;7(11):R104. doi: 10.1186/gb-2006-7-11-r104.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

多层次学习：通过允许信息在各层次之间流动来改进蛋白质、结构域和残基相互作用的预测。

Multi-level learning: improving the prediction of protein, domain and residue interactions by allowing information flow between levels.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

AVAILABILITY

背景

结果

可用性

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献