两种互补的 AI 方法用于预测 UMLS 语义组分配：启发式推理和深度学习。

Two complementary AI approaches for predicting UMLS semantic group assignment: heuristic reasoning and deep learning.

机构信息

National Library of Medicine, National Institutes of Health, Bethesda, Maryland, USA.

出版信息

J Am Med Inform Assoc. 2023 Nov 17;30(12):1887-1894. doi: 10.1093/jamia/ocad152.

DOI:10.1093/jamia/ocad152

PMID:37528056

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10654847/

Abstract

OBJECTIVE

Use heuristic, deep learning (DL), and hybrid AI methods to predict semantic group (SG) assignments for new UMLS Metathesaurus atoms, with target accuracy ≥95%.

MATERIALS AND METHODS

We used train-test datasets from successive 2020AA-2022AB UMLS Metathesaurus releases. Our heuristic "waterfall" approach employed a sequence of 7 different SG prediction methods. Atoms not qualifying for a method were passed on to the next method. The DL approach generated BioWordVec and SapBERT embeddings for atom names, BioWordVec embeddings for source vocabulary names, and BioWordVec embeddings for atom names of the second-to-top nodes of an atom's source hierarchy. We fed a concatenation of the 4 embeddings into a fully connected multilayer neural network with an output layer of 15 nodes (one for each SG). For both approaches, we developed methods to estimate the probability that their predicted SG for an atom would be correct. Based on these estimations, we developed 2 hybrid SG prediction methods combining the strengths of heuristic and DL methods.

RESULTS

The heuristic waterfall approach accurately predicted 94.3% of SGs for 1 563 692 new unseen atoms. The DL accuracy on the same dataset was also 94.3%. The hybrid approaches achieved an average accuracy of 96.5%.

CONCLUSION

Our study demonstrated that AI methods can predict SG assignments for new UMLS atoms with sufficient accuracy to be potentially useful as an intermediate step in the time-consuming task of assigning new atoms to UMLS concepts. We showed that for SG prediction, combining heuristic methods and DL methods can produce better results than either alone.

摘要

目的

使用启发式、深度学习 (DL) 和混合人工智能方法来预测新 UMLS 元词表原子的语义组 (SG) 分配，目标准确率≥95%。

材料和方法

我们使用了来自连续 2020AA-2022AB UMLS 元词表发布的训练-测试数据集。我们的启发式“瀑布”方法采用了 7 种不同的 SG 预测方法的序列。不符合方法要求的原子将传递给下一个方法。DL 方法为原子名称生成了 BioWordVec 和 SapBERT 嵌入，为源词汇名称生成了 BioWordVec 嵌入，为原子源层次结构中第二个最高节点的原子名称生成了 BioWordVec 嵌入。我们将 4 个嵌入的串联输入到一个具有 15 个节点（每个 SG 一个）的全连接多层神经网络中。对于这两种方法，我们都开发了一种方法来估计它们对原子的预测 SG 正确的概率。基于这些估计，我们开发了 2 种混合 SG 预测方法，结合了启发式和 DL 方法的优势。

结果

启发式瀑布方法准确预测了 1563692 个新未见原子的 94.3%的 SG。相同数据集上的 DL 准确率也是 94.3%。混合方法的平均准确率达到了 96.5%。

结论

我们的研究表明，人工智能方法可以足够准确地预测新 UMLS 原子的 SG 分配，这对于将新原子分配给 UMLS 概念这一耗时任务来说，可能是一个有用的中间步骤。我们表明，对于 SG 预测，结合启发式方法和 DL 方法可以产生比单独使用任何一种方法更好的结果。

相似文献

Two complementary AI approaches for predicting UMLS semantic group assignment: heuristic reasoning and deep learning.两种互补的 AI 方法用于预测 UMLS 语义组分配：启发式推理和深度学习。

J Am Med Inform Assoc. 2023 Nov 17;30(12):1887-1894. doi: 10.1093/jamia/ocad152.

Use of word and graph embedding to measure semantic relatedness between Unified Medical Language System concepts.使用词和图嵌入来衡量统一医学语言系统概念之间的语义相关性。

J Am Med Inform Assoc. 2020 Oct 1;27(10):1538-1546. doi: 10.1093/jamia/ocaa136.

Quality Assurance of UMLS Semantic Type Assignments Using SNOMED CT Hierarchies.使用SNOMED CT层次结构对统一医学语言系统语义类型分配进行质量保证

Methods Inf Med. 2016;55(2):158-65. doi: 10.3414/ME14-01-0104. Epub 2015 Apr 30.

Evaluating Biomedical Word Embeddings for Vocabulary Alignment at Scale in the UMLS Metathesaurus Using Siamese Networks.使用连体网络评估生物医学词嵌入以在统一医学语言系统（UMLS）元词表中大规模进行词汇对齐

Proc Conf Assoc Comput Linguist Meet. 2022 May;2022:82-87. doi: 10.18653/v1/2022.insights-1.11.

Biomedical Vocabulary Alignment at Scale in the UMLS Metathesaurus.统一医学语言系统（UMLS）元词表中的大规模生物医学词汇对齐

Proc Int World Wide Web Conf. 2021 Apr;2021:2672-2683. doi: 10.1145/3442381.3450128. Epub 2021 Apr 19.

Consistency across the hierarchies of the UMLS Semantic Network and Metathesaurus.美国国立医学图书馆医学主题词表语义网络和元词表各层次之间的一致性。

J Biomed Inform. 2003 Dec;36(6):450-61. doi: 10.1016/j.jbi.2003.11.001.

Siamese KG-LSTM: A deep learning model for enriching UMLS Metathesaurus synonymy.暹罗连体KG-LSTM：一种用于丰富UMLS元词表同义词的深度学习模型。

Int Conf Knowl Syst Eng. 2020 Nov;2020:281-286. doi: 10.1109/kse50997.2020.9287797. Epub 2020 Dec 16.

The impact of learning Unified Medical Language System knowledge embeddings in relation extraction from biomedical texts.学习统一医学语言系统知识嵌入对生物医学文本中关系抽取的影响。

J Am Med Inform Assoc. 2020 Oct 1;27(10):1556-1567. doi: 10.1093/jamia/ocaa205.

Unified Medical Language System resources improve sieve-based generation and Bidirectional Encoder Representations from Transformers (BERT)-based ranking for concept normalization.统一医学语言系统资源提高了基于筛子的生成和基于双向编码器表示的转换器（BERT）的排名，以实现概念归一化。

J Am Med Inform Assoc. 2020 Oct 1;27(10):1510-1519. doi: 10.1093/jamia/ocaa080.

A review of auditing techniques for the Unified Medical Language System.《统一医学语言系统的审计技术综述》

J Am Med Inform Assoc. 2020 Oct 1;27(10):1625-1638. doi: 10.1093/jamia/ocaa108.

引用本文的文献

Automating pharmacovigilance evidence generation: using large language models to produce context-aware structured query language.实现药物警戒证据生成自动化：利用大语言模型生成情境感知结构化查询语言。

JAMIA Open. 2025 Feb 8;8(1):ooaf003. doi: 10.1093/jamiaopen/ooaf003. eCollection 2025 Feb.

Standards in action: historical and current perspectives.行动中的标准：历史与当前视角

J Am Med Inform Assoc. 2023 Nov 17;30(12):1885-1886. doi: 10.1093/jamia/ocad210.

本文引用的文献

Context-Enriched Learning Models for Aligning Biomedical Vocabularies at Scale in the UMLS Metathesaurus.用于在统一医学语言系统元词表中大规模对齐生物医学词汇的上下文丰富学习模型。

Proc Int World Wide Web Conf. 2022 Apr;2022:1037-1046. doi: 10.1145/3485447.3511946. Epub 2022 Apr 25.

Proc Conf Assoc Comput Linguist Meet. 2022 May;2022:82-87. doi: 10.18653/v1/2022.insights-1.11.

Biomedical Vocabulary Alignment at Scale in the UMLS Metathesaurus.统一医学语言系统（UMLS）元词表中的大规模生物医学词汇对齐

Proc Int World Wide Web Conf. 2021 Apr;2021:2672-2683. doi: 10.1145/3442381.3450128. Epub 2021 Apr 19.

A review of auditing techniques for the Unified Medical Language System.《统一医学语言系统的审计技术综述》

J Am Med Inform Assoc. 2020 Oct 1;27(10):1625-1638. doi: 10.1093/jamia/ocaa108.

UMLS users and uses: a current overview.《统一医学语言系统》的用户与用途：当前概述

J Am Med Inform Assoc. 2020 Jul 19;27(10):1606-11. doi: 10.1093/jamia/ocaa084.

BioWordVec, improving biomedical word embeddings with subword information and MeSH.BioWordVec，利用子词信息和 MeSH 改进生物医学词向量。

Sci Data. 2019 May 10;6(1):52. doi: 10.1038/s41597-019-0055-0.

Validating UMLS Semantic Type Assignments Using SNOMED CT Semantic Tags.使用SNOMED CT语义标签验证统一医学语言系统（UMLS）语义类型分配

Methods Inf Med. 2018 Feb;57(1):43-53. doi: 10.3414/ME17-01-0120. Epub 2018 Apr 5.

Auditing the Assignments of Top-Level Semantic Types in the UMLS Semantic Network to UMLS Concepts.审核统一医学语言系统（UMLS）语义网络中顶级语义类型到UMLS概念的分配情况。

Proceedings (IEEE Int Conf Bioinformatics Biomed). 2017 Nov;2017:1262-1269. doi: 10.1109/BIBM.2017.8217840. Epub 2017 Dec 18.

The Unified Medical Language System.统一医学语言系统

Yearb Med Inform. 1993(1):41-51. doi: 10.1055/s-0038-1637976.

A study of terminology auditors' performance for UMLS semantic type assignments.术语审核员在 UMLS 语义类型分配方面的绩效研究。

J Biomed Inform. 2012 Dec;45(6):1042-8. doi: 10.1016/j.jbi.2012.05.006. Epub 2012 Jun 9.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验