Data Mining and Text Mining Laboratory, Department of Bioinformatics, Bharathiar University, Coimbatore, 641046, Tamil Nadu, India.
Database (Oxford). 2013 Jan 15;2013:bas052. doi: 10.1093/database/bas052. Print 2013.
One of the most common and challenging problem in biomedical text mining is to mine protein-protein interactions (PPIs) from MEDLINE abstracts and full-text research articles because PPIs play a major role in understanding the various biological processes and the impact of proteins in diseases. We implemented, PPInterFinder--a web-based text mining tool to extract human PPIs from biomedical literature. PPInterFinder uses relation keyword co-occurrences with protein names to extract information on PPIs from MEDLINE abstracts and consists of three phases. First, it identifies the relation keyword using a parser with Tregex and a relation keyword dictionary. Next, it automatically identifies the candidate PPI pairs with a set of rules related to PPI recognition. Finally, it extracts the relations by matching the sentence with a set of 11 specific patterns based on the syntactic nature of PPI pair. We find that PPInterFinder is capable of predicting PPIs with the accuracy of 66.05% on AIMED corpus and outperforms most of the existing systems. DATABASE URL: http://www.biomining-bu.in/ppinterfinder/
从 MEDLINE 摘要和全文研究文章中挖掘蛋白质-蛋白质相互作用 (PPI) 是生物医学文本挖掘中最常见和最具挑战性的问题之一,因为 PPI 在理解各种生物过程和蛋白质在疾病中的作用方面起着重要作用。我们实现了 PPInterFinder--一种从生物医学文献中提取人类 PPI 的基于网络的文本挖掘工具。PPInterFinder 使用关系关键字与蛋白质名称的共现来从 MEDLINE 摘要中提取 PPI 信息,它由三个阶段组成。首先,它使用 Tregex 和关系关键字字典的解析器来识别关系关键字。接下来,它使用一组与 PPI 识别相关的规则自动识别候选 PPI 对。最后,它通过基于 PPI 对的句法性质将句子与一组 11 个特定模式进行匹配来提取关系。我们发现,PPInterFinder 在 AIMED 语料库上的预测 PPIs 的准确率为 66.05%,优于大多数现有系统。数据库网址:http://www.biomining-bu.in/ppinterfinder/