Suppr超能文献

利用逻辑定义和词汇特征来检测生物医学术语中缺失的 IS-A 关系。

Leveraging logical definitions and lexical features to detect missing IS-A relations in biomedical terminologies.

机构信息

Department of Neurology, The University of Texas Health Science Center at Houston, Houston, TX, USA.

McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA.

出版信息

J Biomed Semantics. 2024 May 1;15(1):6. doi: 10.1186/s13326-024-00309-y.

Abstract

Biomedical terminologies play a vital role in managing biomedical data. Missing IS-A relations in a biomedical terminology could be detrimental to its downstream usages. In this paper, we investigate an approach combining logical definitions and lexical features to discover missing IS-A relations in two biomedical terminologies: SNOMED CT and the National Cancer Institute (NCI) thesaurus. The method is applied to unrelated concept-pairs within non-lattice subgraphs: graph fragments within a terminology likely to contain various inconsistencies. Our approach first compares whether the logical definition of a concept is more general than  that of the other concept. Then, we check whether the lexical features of the concept are contained in those of the other concept. If both constraints are satisfied, we suggest a potentially missing IS-A relation between the two concepts. The method identified 982 potential missing IS-A relations for SNOMED CT and 100 for NCI thesaurus. In order to assess the efficacy of our approach, a random sample of results belonging to the "Clinical Findings" and "Procedure" subhierarchies of SNOMED CT and results belonging to the "Drug, Food, Chemical or Biomedical Material" subhierarchy of the NCI thesaurus were evaluated by domain experts. The evaluation results revealed that 118 out of 150 suggestions are valid for SNOMED CT and 17 out of 20 are valid for NCI thesaurus.

摘要

生物医学术语在管理生物医学数据方面起着至关重要的作用。生物医学术语中缺失的 IS-A 关系可能会对其下游应用造成不利影响。在本文中,我们研究了一种结合逻辑定义和词汇特征的方法,以发现两个生物医学术语:SNOMED CT 和国家癌症研究所 (NCI) 词库中的缺失 IS-A 关系。该方法应用于非格子网内不相关的概念对:术语内可能包含各种不一致的图片段。我们的方法首先比较一个概念的逻辑定义是否比另一个概念的逻辑定义更一般。然后,我们检查概念的词汇特征是否包含在另一个概念的词汇特征中。如果这两个约束都得到满足,我们就会在两个概念之间建议一个潜在的缺失 IS-A 关系。该方法为 SNOMED CT 确定了 982 个潜在的缺失 IS-A 关系,为 NCI 词库确定了 100 个潜在的缺失 IS-A 关系。为了评估我们方法的效果,对属于 SNOMED CT 的“临床发现”和“程序”子层次结构以及属于 NCI 词库的“药物、食品、化学或生物医学材料”子层次结构的结果的随机样本进行了评估,评估结果由领域专家提供。评估结果表明,对于 SNOMED CT,有 150 个建议中有 118 个是有效的,对于 NCI 词库,有 20 个建议中有 17 个是有效的。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bdd0/11064313/27f186fc4219/13326_2024_309_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验