Zheng Fengbo, Cui Licong
Department of Computer Science, University of Kentucky, Lexington, Kentucky, USA.
School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, Texas, USA.
Proceedings (IEEE Int Conf Bioinformatics Biomed). 2020 Dec;2020. doi: 10.1109/bibm49941.2020.9313186. Epub 2021 Jan 13.
Biomedical terminologies have been increasingly used in modern biomedical research and applications to facilitate data management and ensure semantic interoperability. As part of the evolution process, new concepts are regularly added to biomedical terminologies in response to the evolving domain knowledge and emerging applications. Most existing concept enrichment methods suggest new concepts via directly importing knowledge from external sources. In this paper, we introduced a lexical method based on formal concept analysis (FCA) to identify potentially missing concepts in a given terminology by leveraging its intrinsic knowledge - concept names. We first construct the FCA formal context based on the lexical features of concepts. Then we perform multistage intersection to formalize new concepts and detect potentially missing concepts. We applied our method to the sub-hierarchy in the National Cancer Institute (NCI) Thesaurus (19.08d version) and identified a total of 8,983 potentially missing concepts. As a preliminary evaluation of our method to validate the potentially missing concepts, we further checked whether they were included in any external source terminology in the Unified Medical Language System (UMLS). The result showed that 592 out of 8,937 potentially missing concepts were found in the UMLS.
生物医学术语在现代生物医学研究和应用中越来越多地被使用,以促进数据管理并确保语义互操作性。作为进化过程的一部分,为了应对不断发展的领域知识和新兴应用,新的概念会定期添加到生物医学术语中。大多数现有的概念丰富方法通过直接从外部来源导入知识来提出新概念。在本文中,我们引入了一种基于形式概念分析(FCA)的词汇方法,通过利用给定术语的内在知识——概念名称,来识别其中可能缺失的概念。我们首先基于概念的词汇特征构建FCA形式背景。然后我们进行多阶段交集运算以形式化新概念并检测可能缺失的概念。我们将我们的方法应用于美国国立癌症研究所(NCI)叙词表(19.08d版本)的子层次结构,共识别出8983个可能缺失的概念。作为对我们方法的初步评估,以验证这些可能缺失的概念,我们进一步检查它们是否包含在统一医学语言系统(UMLS)的任何外部源术语中。结果表明,在8937个可能缺失的概念中,有592个在UMLS中被找到。