Bakal Gokhan, Kavuluru Ramakanth
Department of Computer Science, University of Kentucky, Lexington, KY, USA.
Division of Biomedical Informatics, Department of Biostatistics, University of Kentucky, Lexington, KY, USA.
Min Intell Knowl Explor (2015). 2015 Dec;9468:586-596. doi: 10.1007/978-3-319-26832-3_55. Epub 2016 Jan 3.
Identifying new potential treatment options (say, medications and procedures) for known medical conditions that cause human disease burden is a central task of biomedical research. Since all candidate drugs cannot be tested with animal and clinical trials, approaches are first attempted to identify promising candidates. Even before this step, due to recent advances, or computational approaches are also being employed to identify viable treatment options. Generally, natural language processing (NLP) and machine learning are used to predict specific relations between any given pair of entities using the distant supervision approach. In this paper, we report preliminary results on predicting treatment relations between biomedical entities purely based on semantic patterns over biomedical knowledge graphs. As such, we refrain from explicitly using NLP, although the knowledge graphs themselves may be built from NLP extractions. Our intuition is fairly straightforward - entities that participate in a treatment relation may be connected using similar path patterns in biomedical knowledge graphs extracted from scientific literature. Using a dataset of treatment relation instances derived from the well known Unified Medical Language System (UMLS), we verify our intuition by employing graph path patterns from a well known knowledge graph as features in machine learned models. We achieve a high recall (92 %) but precision, however, decreases from 95% to an acceptable 71% as we go from uniform class distribution to a ten fold increase in negative instances. We also demonstrate models trained with patterns of length ≤ 3 result in statistically significant gains in F-score over those trained with patterns of length ≤ 2. Our results show the potential of exploiting knowledge graphs for relation extraction and we believe this is the first effort to employ graph patterns as features for identifying biomedical relations.
识别已知导致人类疾病负担的医学病症的新潜在治疗方案(如药物和程序)是生物医学研究的核心任务。由于并非所有候选药物都能通过动物试验和临床试验进行测试,因此首先会尝试各种方法来识别有前景的候选药物。甚至在这一步之前,由于最近的进展,计算方法也被用于识别可行的治疗方案。一般来说,自然语言处理(NLP)和机器学习被用于使用远程监督方法预测任意给定实体对之间的特定关系。在本文中,我们报告了仅基于生物医学知识图谱上的语义模式来预测生物医学实体之间治疗关系的初步结果。因此,尽管知识图谱本身可能是从NLP提取中构建的,但我们避免明确使用NLP。我们的直觉相当简单——参与治疗关系的实体可能在从科学文献中提取的生物医学知识图谱中通过相似的路径模式相连。使用从著名的统一医学语言系统(UMLS)派生的治疗关系实例数据集,我们通过将来自著名知识图谱的图路径模式用作机器学习模型中的特征来验证我们的直觉。我们实现了较高的召回率(92%),然而,随着我们从均匀类分布到负实例增加十倍,精确率从95%降至可接受的71%。我们还证明,与使用长度≤2的模式训练的模型相比,使用长度≤3的模式训练的模型在F值上有统计学上的显著提高。我们的结果显示了利用知识图谱进行关系提取的潜力,并且我们相信这是首次将图模式用作识别生物医学关系的特征的努力。