NEC Labs America, Princeton, NJ 08540, USA.
Bioinformatics. 2010 Sep 15;26(18):i645-52. doi: 10.1093/bioinformatics/btq394.
Protein-protein interactions (PPIs) are critical for virtually every biological function. Recently, researchers suggested to use supervised learning for the task of classifying pairs of proteins as interacting or not. However, its performance is largely restricted by the availability of truly interacting proteins (labeled). Meanwhile, there exists a considerable amount of protein pairs where an association appears between two partners, but not enough experimental evidence to support it as a direct interaction (partially labeled).
We propose a semi-supervised multi-task framework for predicting PPIs from not only labeled, but also partially labeled reference sets. The basic idea is to perform multi-task learning on a supervised classification task and a semi-supervised auxiliary task. The supervised classifier trains a multi-layer perceptron network for PPI predictions from labeled examples. The semi-supervised auxiliary task shares network layers of the supervised classifier and trains with partially labeled examples. Semi-supervision could be utilized in multiple ways. We tried three approaches in this article, (i) classification (to distinguish partial positives with negatives); (ii) ranking (to rate partial positive more likely than negatives); (iii) embedding (to make data clusters get similar labels). We applied this framework to improve the identification of interacting pairs between HIV-1 and human proteins. Our method improved upon the state-of-the-art method for this task indicating the benefits of semi-supervised multi-task learning using auxiliary information.
蛋白质-蛋白质相互作用(PPIs)对于几乎所有的生物功能都是至关重要的。最近,研究人员建议使用监督学习来完成将蛋白质对分类为相互作用或非相互作用的任务。然而,它的性能在很大程度上受到真正相互作用的蛋白质(标记)的可用性的限制。同时,存在相当数量的蛋白质对,它们之间存在关联,但没有足够的实验证据支持它们是直接相互作用(部分标记)。
我们提出了一个半监督多任务框架,用于从不仅标记的,而且还包括部分标记的参考集中预测 PPIs。基本思想是在监督分类任务和半监督辅助任务上进行多任务学习。监督分类器从标记的例子中训练多层感知机网络来进行 PPI 预测。半监督辅助任务共享监督分类器的网络层,并使用部分标记的例子进行训练。半监督可以通过多种方式利用。我们在本文中尝试了三种方法,(i)分类(区分部分阳性与阴性);(ii)排序(对部分阳性的评分高于阴性);(iii)嵌入(使数据聚类得到相似的标签)。我们将此框架应用于提高 HIV-1 与人类蛋白质之间相互作用对的识别。我们的方法优于该任务的最新方法,表明了使用辅助信息进行半监督多任务学习的好处。