Suppr超能文献

多层次学习:通过允许信息在各层次之间流动来改进蛋白质、结构域和残基相互作用的预测。

Multi-level learning: improving the prediction of protein, domain and residue interactions by allowing information flow between levels.

作者信息

Yip Kevin Y, Kim Philip M, McDermott Drew, Gerstein Mark

机构信息

Department of Computer Science, Yale University, 51 Prospect Street, New Haven, CT 06511, USA.

出版信息

BMC Bioinformatics. 2009 Aug 5;10:241. doi: 10.1186/1471-2105-10-241.

Abstract

BACKGROUND

Proteins interact through specific binding interfaces that contain many residues in domains. Protein interactions thus occur on three different levels of a concept hierarchy: whole-proteins, domains, and residues. Each level offers a distinct and complementary set of features for computationally predicting interactions, including functional genomic features of whole proteins, evolutionary features of domain families and physical-chemical features of individual residues. The predictions at each level could benefit from using the features at all three levels. However, it is not trivial as the features are provided at different granularity.

RESULTS

To link up the predictions at the three levels, we propose a multi-level machine-learning framework that allows for explicit information flow between the levels. We demonstrate, using representative yeast interaction networks, that our algorithm is able to utilize complementary feature sets to make more accurate predictions at the three levels than when the three problems are approached independently. To facilitate application of our multi-level learning framework, we discuss three key aspects of multi-level learning and the corresponding design choices that we have made in the implementation of a concrete learning algorithm. 1) Architecture of information flow: we show the greater flexibility of bidirectional flow over independent levels and unidirectional flow; 2) Coupling mechanism of the different levels: We show how this can be accomplished via augmenting the training sets at each level, and discuss the prevention of error propagation between different levels by means of soft coupling; 3) Sparseness of data: We show that the multi-level framework compounds data sparsity issues, and discuss how this can be dealt with by building local models in information-rich parts of the data. Our proof-of-concept learning algorithm demonstrates the advantage of combining levels, and opens up opportunities for further research.

AVAILABILITY

The software and a readme file can be downloaded at http://networks.gersteinlab.org/mll. The programs are written in Java, and can be run on any platform with Java 1.4 or higher and Apache Ant 1.7.0 or higher installed. The software can be used without a license.

摘要

背景

蛋白质通过特定的结合界面相互作用,这些界面在结构域中包含许多残基。因此,蛋白质相互作用发生在概念层次结构的三个不同层面:全蛋白质、结构域和残基。每个层面都为计算预测相互作用提供了一组独特且互补的特征,包括全蛋白质的功能基因组特征、结构域家族的进化特征以及单个残基的物理化学特征。每个层面的预测都可以从使用所有三个层面的特征中受益。然而,这并非易事,因为这些特征是以不同的粒度提供的。

结果

为了将三个层面的预测联系起来,我们提出了一个多层次机器学习框架,该框架允许各层面之间进行明确的信息流。我们使用具有代表性的酵母相互作用网络证明,与独立处理这三个问题相比,我们的算法能够利用互补特征集在三个层面做出更准确的预测。为了便于应用我们的多层次学习框架,我们讨论了多层次学习的三个关键方面以及我们在具体学习算法实现中所做的相应设计选择。1)信息流架构:我们展示了双向流相对于独立层面和单向流具有更大的灵活性;2)不同层面的耦合机制:我们展示了如何通过扩充每个层面的训练集来实现这一点,并讨论了通过软耦合防止不同层面之间的错误传播;3)数据稀疏性:我们表明多层次框架加剧了数据稀疏性问题,并讨论了如何通过在数据丰富的部分构建局部模型来处理这一问题。我们的概念验证学习算法展示了整合各层面的优势,并为进一步研究开辟了机会。

可用性

软件和自述文件可从http://networks.gersteinlab.org/mll下载。这些程序用Java编写,可以在安装了Java 1.4或更高版本以及Apache Ant 1.7.0或更高版本的任何平台上运行。该软件无需许可证即可使用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/538d/2734556/cd0a1a545c2f/1471-2105-10-241-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验