基于逻辑定义的 SNOMED CT 中潜在缺失概念的识别。

Logical definition-based identification of potential missing concepts in SNOMED CT.

机构信息

School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, USA.

Department of Neurology, McGovern Medical School, University of Texas Health Science Center at Houston, Houston, TX, USA.

出版信息

BMC Med Inform Decis Mak. 2023 May 9;23(Suppl 1):87. doi: 10.1186/s12911-023-02183-7.

DOI:10.1186/s12911-023-02183-7

PMID:37161566

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10169302/

Abstract

BACKGROUND

Biomedical ontologies are representations of biomedical knowledge that provide terms with precisely defined meanings. They play a vital role in facilitating biomedical research in a cross-disciplinary manner. Quality issues of biomedical ontologies will hinder their effective usage. One such quality issue is missing concepts. In this study, we introduce a logical definition-based approach to identify potential missing concepts in SNOMED CT. A unique contribution of our approach is that it is capable of obtaining both logical definitions and fully specified names for potential missing concepts.

METHOD

The logical definitions of unrelated pairs of fully defined concepts in non-lattice subgraphs that indicate quality issues are intersected to generate the logical definitions of potential missing concepts. A text summarization model (called PEGASUS) is fine-tuned to predict the fully specified names of the potential missing concepts from their generated logical definitions. Furthermore, the identified potential missing concepts are validated using external resources including the Unified Medical Language System (UMLS), biomedical literature in PubMed, and a newer version of SNOMED CT.

RESULTS

From the March 2021 US Edition of SNOMED CT, we obtained a total of 30,313 unique logical definitions for potential missing concepts through the intersecting process. We fine-tuned a PEGASUS summarization model with 289,169 training instances and tested it on 36,146 instances. The model achieved 72.83 of ROUGE-1, 51.06 of ROUGE-2, and 71.76 of ROUGE-L on the test dataset. The model correctly predicted 11,549 out of 36,146 fully specified names in the test dataset. Applying the fine-tuned model on the 30,313 unique logical definitions, 23,031 total potential missing concepts were identified. Out of these, a total of 2,312 (10.04%) were automatically validated by either of the three resources.

CONCLUSIONS

The results showed that our logical definition-based approach for identification of potential missing concepts in SNOMED CT is encouraging. Nevertheless, there is still room for improving the performance of naming concepts based on logical definitions.

摘要

背景

生物医学本体是生物医学知识的表示形式，它为术语提供了精确定义的含义。它们在促进跨学科的生物医学研究方面发挥着至关重要的作用。生物医学本体的质量问题将阻碍它们的有效使用。其中一个质量问题是缺少概念。在这项研究中，我们引入了一种基于逻辑定义的方法来识别 SNOMED CT 中的潜在缺失概念。我们的方法的一个独特贡献是，它能够为潜在缺失概念获得逻辑定义和完全指定的名称。

方法

在非格子网图中，将完全定义的概念之间的不相关对子的逻辑定义进行交叉，以生成潜在缺失概念的逻辑定义。对一个文本摘要模型（称为 PEGASUS）进行微调，以根据生成的逻辑定义预测潜在缺失概念的完全指定名称。此外，使用外部资源（包括统一医学语言系统（UMLS）、PubMed 中的生物医学文献和较新版本的 SNOMED CT）验证所识别的潜在缺失概念。

结果

从 2021 年 3 月的 SNOMED CT 美国版中，我们通过交叉过程获得了总共 30313 个潜在缺失概念的唯一逻辑定义。我们使用 289169 个训练实例对 PEGASUS 摘要模型进行了微调，并在 36146 个实例上进行了测试。该模型在测试数据集上的 ROUGE-1 为 72.83，ROUGE-2 为 51.06，ROUGE-L 为 71.76。该模型在测试数据集中正确预测了 36146 个完全指定名称中的 11549 个。将经过微调的模型应用于 30313 个独特的逻辑定义，总共确定了 23031 个潜在的缺失概念。其中，共有 2312 个（10.04%）被三种资源中的任意一种自动验证。

结论

结果表明，我们在 SNOMED CT 中识别潜在缺失概念的基于逻辑定义的方法令人鼓舞。然而，在基于逻辑定义的概念命名性能方面仍有改进的空间。

相似文献

Logical definition-based identification of potential missing concepts in SNOMED CT.

BMC Med Inform Decis Mak. 2023 May 9;23(Suppl 1):87. doi: 10.1186/s12911-023-02183-7.

Leveraging non-lattice subgraphs for suggestion of new concepts for SNOMED CT.

Proceedings (IEEE Int Conf Bioinformatics Biomed). 2021 Dec;2021:1805-1812. doi: 10.1109/bibm52615.2021.9669407.

Auditing SNOMED CT hierarchical relations based on lexical features of concepts in non-lattice subgraphs.

J Biomed Inform. 2018 Feb;78:177-184. doi: 10.1016/j.jbi.2017.12.010. Epub 2017 Dec 20.

Identification of missing concepts in biomedical terminologies using sequence-based formal concept analysis.

BMC Med Inform Decis Mak. 2021 Nov 9;21(Suppl 7):234. doi: 10.1186/s12911-021-01592-w.

A substring replacement approach for identifying missing IS-A relations in SNOMED CT.

Proceedings (IEEE Int Conf Bioinformatics Biomed). 2022 Dec;2022:2611-2618. doi: 10.1109/bibm55620.2022.9995595. Epub 2023 Jan 2.

Leveraging logical definitions and lexical features to detect missing IS-A relations in biomedical terminologies.

J Biomed Semantics. 2024 May 1;15(1):6. doi: 10.1186/s13326-024-00309-y.

A deep learning approach to identify missing is-a relations in SNOMED CT.

J Am Med Inform Assoc. 2023 Feb 16;30(3):475-484. doi: 10.1093/jamia/ocac248.

From lexical regularities to axiomatic patterns for the quality assurance of biomedical terminologies and ontologies.

J Biomed Inform. 2018 Aug;84:59-74. doi: 10.1016/j.jbi.2018.06.008. Epub 2018 Jun 14.

Integrating cancer diagnosis terminologies based on logical definitions of SNOMED CT concepts.

J Biomed Inform. 2017 Oct;74:46-58. doi: 10.1016/j.jbi.2017.08.013. Epub 2017 Aug 24.

A transformation-based method for auditing the IS-A hierarchy of biomedical terminologies in the Unified Medical Language System.

J Am Med Inform Assoc. 2020 Oct 1;27(10):1568-1575. doi: 10.1093/jamia/ocaa123.

本文引用的文献

Identifying Missing Hierarchical Relations in SNOMED CT from Logical Definitions Based on the Lexical Features of Concept Names.

CEUR Workshop Proc. 2016 Aug;1747.

Leveraging non-lattice subgraphs for suggestion of new concepts for SNOMED CT.

Proceedings (IEEE Int Conf Bioinformatics Biomed). 2021 Dec;2021:1805-1812. doi: 10.1109/bibm52615.2021.9669407.

Identification of missing concepts in biomedical terminologies using sequence-based formal concept analysis.

BMC Med Inform Decis Mak. 2021 Nov 9;21(Suppl 7):234. doi: 10.1186/s12911-021-01592-w.

A Lexical-based Formal Concept Analysis Method to Identify Missing Concepts in the NCI Thesaurus.

Proceedings (IEEE Int Conf Bioinformatics Biomed). 2020 Dec;2020. doi: 10.1109/bibm49941.2020.9313186. Epub 2021 Jan 13.

A lexical-based approach for exhaustive detection of missing hierarchical IS-A relations in SNOMED CT.

AMIA Annu Symp Proc. 2021 Jan 25;2020:1392-1401. eCollection 2020.

An efficient, large-scale, non-lattice-detection algorithm for exhaustive structural auditing of biomedical ontologies.

J Biomed Inform. 2018 Apr;80:106-119. doi: 10.1016/j.jbi.2018.03.004. Epub 2018 Mar 13.

Auditing SNOMED CT hierarchical relations based on lexical features of concepts in non-lattice subgraphs.

J Biomed Inform. 2018 Feb;78:177-184. doi: 10.1016/j.jbi.2017.12.010. Epub 2017 Dec 20.

Mining non-lattice subgraphs for detecting missing hierarchical relations and concepts in SNOMED CT.

J Am Med Inform Assoc. 2017 Jul 1;24(4):788-798. doi: 10.1093/jamia/ocw175.

Topological-Pattern-Based Recommendation of UMLS Concepts for National Cancer Institute Thesaurus.

AMIA Annu Symp Proc. 2017 Feb 10;2016:618-627. eCollection 2016.

AMIA Annu Symp Proc. 2015 Nov 5;2015:386-95. eCollection 2015.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于逻辑定义的 SNOMED CT 中潜在缺失概念的识别。

Logical definition-based identification of potential missing concepts in SNOMED CT.

机构信息

出版信息

BACKGROUND

METHOD

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

本文引用的文献