College of Computer Science and Technology, Dalian University of Technology, Dalian, China.
IEEE Trans Nanobioscience. 2013 Sep;12(3):173-81. doi: 10.1109/TNB.2013.2263837. Epub 2013 Aug 21.
Protein-protein interactions (PPIs) play a key role in various aspects of the structural and functional organization of the cell. Knowledge about them unveils the molecular mechanisms of biological processes. However, the amount of biomedical literature regarding protein interactions is increasing rapidly and it is difficult for interaction database curators to detect and curate protein interaction information manually. In this paper, we present a PPI extraction system, termed PPIExtractor, which automatically extracts PPIs from biomedical text and visualizes them. Given a Medline record dataset, PPIExtractor first applies Feature Coupling Generalization (FCG) to tag protein names in text, next uses the extended semantic similarity-based method to normalize them, then combines feature-based, convolution tree and graph kernels to extract PPIs, and finally visualizes the PPI network. Experimental evaluations show that PPIExtractor can achieve state-of-the-art performance on a DIP subset with respect to comparable evaluations.
蛋白质-蛋白质相互作用(PPIs)在细胞的结构和功能组织的各个方面起着关键作用。对它们的了解揭示了生物过程的分子机制。然而,关于蛋白质相互作用的生物医学文献数量正在迅速增加,交互数据库管理员很难手动检测和管理蛋白质交互信息。在本文中,我们提出了一个称为 PPIExtractor 的 PPI 提取系统,它可以自动从生物医学文本中提取 PPI 并进行可视化。给定一个 Medline 记录数据集,PPIExtractor 首先应用特征耦合泛化(FCG)来标记文本中的蛋白质名称,接下来使用扩展的基于语义相似性的方法对其进行标准化,然后结合基于特征、卷积树和图核来提取 PPI,并最终可视化 PPI 网络。实验评估表明,在 DIP 子集上,PPIExtractor 在可比评估方面可以达到最先进的性能。