IEEE/ACM Trans Comput Biol Bioinform. 2019 Mar-Apr;16(2):607-616. doi: 10.1109/TCBB.2017.2777448. Epub 2017 Nov 24.
Protein-protein interaction (PPI) identification is an important task in text mining. Most PPI detection systems make predictions solely based on evidence within a single sentence and often suffer from the heavy burden of manual annotation. This paper approaches PPI detection task from a different paradigm by investigating the context of protein pairs collected from a large corpus and their relations. First, crucial cues in the context are exploited to make initial predictions. Then, relational similarity between protein pairs is calculated. Finally, evidence from the two views is integrated in the framework of minimum cuts algorithm. Experimental results show that the graph model achieves better performance than standard supervised approaches. Using 20 percent data as the training set, our algorithm achieves higher accuracy than support vector machine (SVM) using 80 percent data as training data. Moreover, the semi-supervised settings reveal promising directions for PPI identification exploiting unlabeled data.
蛋白质-蛋白质相互作用(PPI)的识别是文本挖掘中的一项重要任务。大多数 PPI 检测系统仅基于单个句子中的证据进行预测,并且经常受到手动注释的繁重负担的影响。本文通过研究从大型语料库中收集的蛋白质对及其关系的上下文,从不同的范例入手来研究 PPI 检测任务。首先,利用上下文的关键线索进行初步预测。然后,计算蛋白质对之间的关系相似性。最后,在最小割算法的框架中整合来自两种视图的证据。实验结果表明,该图模型的性能优于标准监督方法。使用 20%的数据作为训练集,我们的算法比使用 80%的数据作为训练数据的支持向量机(SVM)实现了更高的准确性。此外,半监督设置为利用未标记数据进行 PPI 识别提供了有前途的方向。