训练集扩展：一种从有限且不均衡的可靠相互作用中改善生物网络重建的方法。

Training set expansion: an approach to improving the reconstruction of biological networks from limited and uneven reliable interactions.

作者信息

Yip Kevin Y, Gerstein Mark

机构信息

Department of Computer Science, Yale University, New Haven, CT 06511, USA.

出版信息

Bioinformatics. 2009 Jan 15;25(2):243-50. doi: 10.1093/bioinformatics/btn602. Epub 2008 Nov 17.

DOI:10.1093/bioinformatics/btn602

PMID:19015141

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2639005/

Abstract

MOTIVATION

An important problem in systems biology is reconstructing complete networks of interactions between biological objects by extrapolating from a few known interactions as examples. While there are many computational techniques proposed for this network reconstruction task, their accuracy is consistently limited by the small number of high-confidence examples, and the uneven distribution of these examples across the potential interaction space, with some objects having many known interactions and others few.

RESULTS

To address this issue, we propose two computational methods based on the concept of training set expansion. They work particularly effectively in conjunction with kernel approaches, which are a popular class of approaches for fusing together many disparate types of features. Both our methods are based on semi-supervised learning and involve augmenting the limited number of gold-standard training instances with carefully chosen and highly confident auxiliary examples. The first method, prediction propagation, propagates highly confident predictions of one local model to another as the auxiliary examples, thus learning from information-rich regions of the training network to help predict the information-poor regions. The second method, kernel initialization, takes the most similar and most dissimilar objects of each object in a global kernel as the auxiliary examples. Using several sets of experimentally verified protein-protein interactions from yeast, we show that training set expansion gives a measurable performance gain over a number of representative, state-of-the-art network reconstruction methods, and it can correctly identify some interactions that are ranked low by other methods due to the lack of training examples of the involved proteins.

摘要

动机

系统生物学中的一个重要问题是通过从少数已知的相互作用示例进行推断，来重建生物对象之间完整的相互作用网络。虽然针对此网络重建任务提出了许多计算技术，但它们的准确性始终受到高可信度示例数量少以及这些示例在潜在相互作用空间中分布不均的限制，一些对象有许多已知相互作用，而另一些对象则很少。

结果

为了解决这个问题，我们提出了两种基于训练集扩展概念的计算方法。它们与核方法结合使用时特别有效，核方法是一类流行的方法，用于融合许多不同类型的特征。我们的两种方法都基于半监督学习，并且涉及用精心选择且高度可信的辅助示例来扩充有限数量的金标准训练实例。第一种方法是预测传播，将一个局部模型的高可信度预测作为辅助示例传播到另一个局部模型，从而从训练网络中信息丰富的区域学习，以帮助预测信息贫乏的区域。第二种方法是核初始化，将全局核中每个对象最相似和最不相似的对象作为辅助示例。使用来自酵母的几组经过实验验证的蛋白质 - 蛋白质相互作用，我们表明训练集扩展比许多具有代表性的、当前最先进的网络重建方法在性能上有可测量的提升，并且它可以正确识别一些由于所涉及蛋白质缺乏训练示例而在其他方法中排名较低的相互作用。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

训练集扩展：一种从有限且不均衡的可靠相互作用中改善生物网络重建的方法。

Training set expansion: an approach to improving the reconstruction of biological networks from limited and uneven reliable interactions.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

动机

结果

相似文献

引用本文的文献

本文引用的文献

训练集扩展：一种从有限且不均衡的可靠相互作用中改善生物网络重建的方法。

Training set expansion: an approach to improving the reconstruction of biological networks from limited and uneven reliable interactions.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

动机

结果

相似文献

引用本文的文献

本文引用的文献