Luo Lingyun, Feng Jingtao, Yu Huijun, Wang Jiaolong
School of Computer Science, University of South China, Hengyang, China.
Hunan Medical Big Data International Science and Technology Innovation Cooperation Base, Hengyang, China.
JMIR Med Inform. 2020 Nov 25;8(11):e22333. doi: 10.2196/22333.
As the manual creation and maintenance of biomedical ontologies are labor-intensive, automatic aids are desirable in the lifecycle of ontology development.
Provided with a set of concept names in the Foundational Model of Anatomy (FMA), we propose an innovative method for automatically generating the taxonomy and the partonomy structures among them, respectively.
Our approach comprises 2 main tasks: The first task is predicting the direct relation between 2 given concept names by utilizing word embedding methods and training 2 machine learning models, Convolutional Neural Networks (CNN) and Bidirectional Long Short-term Memory Networks (Bi-LSTM). The second task is the introduction of an original granularity-based method to identify the semantic structures among a group of given concept names by leveraging these trained models.
Results show that both CNN and Bi-LSTM perform well on the first task, with F1 measures above 0.91. For the second task, our approach achieves an average F1 measure of 0.79 on 100 case studies in the FMA using Bi-LSTM, which outperforms the primitive pairwise-based method.
We have investigated an automatic way of predicting a hierarchical relationship between 2 concept names; based on this, we have further invented a methodology to structure a group of concept names automatically. This study is an initial investigation that will shed light on further work on the automatic creation and enrichment of biomedical ontologies.
由于生物医学本体的手动创建和维护需要耗费大量人力,因此在本体开发的生命周期中需要自动化辅助工具。
给定一组解剖学基础模型(FMA)中的概念名称,我们提出一种创新方法,分别自动生成它们之间的分类法和部分-整体结构。
我们的方法包括2个主要任务:第一个任务是通过利用词嵌入方法并训练2个机器学习模型,即卷积神经网络(CNN)和双向长短期记忆网络(Bi-LSTM),来预测2个给定概念名称之间的直接关系。第二个任务是引入一种基于粒度的原始方法,通过利用这些训练好的模型来识别一组给定概念名称之间的语义结构。
结果表明,CNN和Bi-LSTM在第一个任务上均表现良好,F1值均高于0.91。对于第二个任务,我们的方法在FMA的100个案例研究中使用Bi-LSTM实现了平均F1值为0.79,优于基于原始成对方法。
我们研究了一种预测2个概念名称之间层次关系的自动方法;基于此,我们进一步发明了一种自动构建一组概念名称结构的方法。本研究是一项初步调查,将为生物医学本体的自动创建和丰富的进一步工作提供启示。