Weissenborn Dirk, Schroeder Michael, Tsatsaronis George
DFKI Projektbüro Berlin, Alt-Moabit 91c, Berlin, 10559 Germany ; Biotechnology Center, Technische Universität Dresden, Tatzberg 47/49, Dresden, 01307 Germany.
Biotechnology Center, Technische Universität Dresden, Tatzberg 47/49, Dresden, 01307 Germany.
J Biomed Semantics. 2015 Jul 6;6:28. doi: 10.1186/s13326-015-0021-5. eCollection 2015.
The complexity and scale of the knowledge in the biomedical domain has motivated research work towards mining heterogeneous data from both structured and unstructured knowledge bases. Towards this direction, it is necessary to combine facts in order to formulate hypotheses or draw conclusions about the domain concepts. This work addresses this problem by using indirect knowledge connecting two concepts in a knowledge graph to discover hidden relations between them. The graph represents concepts as vertices and relations as edges, stemming from structured (ontologies) and unstructured (textual) data. In this graph, path patterns, i.e. sequences of relations, are mined using distant supervision that potentially characterize a biomedical relation.
It is possible to identify characteristic path patterns of biomedical relations from this representation using machine learning. For experimental evaluation two frequent biomedical relations, namely "has target", and "may treat", are chosen. Results suggest that relation discovery using indirect knowledge is possible, with an AUC that can reach up to 0.8, a result which is a great improvement compared to the random classification, and which shows that good predictions can be prioritized by following the suggested approach.
Analysis of the results indicates that the models can successfully learn expressive path patterns for the examined relations. Furthermore, this work demonstrates that the constructed graph allows for the easy integration of heterogeneous information and discovery of indirect connections between biomedical concepts.
生物医学领域知识的复杂性和规模推动了从结构化和非结构化知识库中挖掘异构数据的研究工作。朝着这个方向,有必要整合事实以形成关于领域概念的假设或得出结论。这项工作通过使用知识图谱中连接两个概念的间接知识来发现它们之间的隐藏关系,从而解决了这个问题。该图谱将概念表示为顶点,关系表示为边,这些边源自结构化(本体)和非结构化(文本)数据。在这个图谱中,路径模式,即关系序列,是使用远距离监督挖掘出来的,这些模式可能表征生物医学关系。
利用机器学习从这种表示中识别生物医学关系的特征路径模式是可行的。为了进行实验评估,选择了两个常见的生物医学关系,即“有靶点”和“可能治疗”。结果表明,利用间接知识进行关系发现是可行的,曲线下面积(AUC)可达0.8,与随机分类相比有很大改进,这表明按照建议的方法可以优先进行良好的预测。
结果分析表明,模型可以成功地为所研究的关系学习有表现力的路径模式。此外,这项工作表明,构建的图谱允许轻松整合异构信息并发现生物医学概念之间的间接联系。