Suppr超能文献

通过将检索增强生成算法与大语言模型相结合来映射药物术语

Mapping Drug Terms via Integration of a Retrieval-Augmented Generation Algorithm with a Large Language Model.

作者信息

Kimura Eizen, Kawakami Yukinobu, Inoue Shingo, Okajima Ai

机构信息

Department of Medical Informatics, Medical School of Ehime University, Toon, Ehime, Japan.

Yuimedi Inc., Tokyo, Japan.

出版信息

Healthc Inform Res. 2024 Oct;30(4):355-363. doi: 10.4258/hir.2024.30.4.355. Epub 2024 Oct 31.

Abstract

OBJECTIVES

This study evaluated the efficacy of integrating a retrieval-augmented generation (RAG) model and a large language model (LLM) to improve the accuracy of drug name mapping across international vocabularies.

METHODS

Drug ingredient names were translated into English using the Japanese Accepted Names for Pharmaceuticals. Drug concepts were extracted from the standard vocabulary of OHDSI, and the accuracy of mappings between translated terms and RxNorm was assessed by vector similarity, using the BioBERT-generated embedded vectors as the baseline. Subsequently, we developed LLMs with RAG that distinguished the final candidates from the baseline. We assessed the efficacy of the LLM with RAG in candidate selection by comparing it with conventional methods based on vector similarity.

RESULTS

The evaluation metrics demonstrated the superior performance of the combined LLM + RAG over traditional vector similarity methods. Notably, the hit rates of the Mixtral 8x7b and GPT-3.5 models exceeded 90%, significantly outperforming the baseline rate of 64% across stratified groups of PO drugs, injections, and all interventions. Furthermore, the r-precision metric, which measures the alignment between model judgment and human evaluation, revealed a notable improvement in LLM performance, ranging from 41% to 50% compared to the baseline of 23%.

CONCLUSIONS

Integrating an RAG and an LLM outperformed conventional string comparison and embedding vector similarity techniques, offering a more refined approach to global drug information mapping.

摘要

目的

本研究评估了整合检索增强生成(RAG)模型和大语言模型(LLM)以提高跨国际词汇表的药品名称映射准确性的效果。

方法

使用日本药品通用名称将药品成分名称翻译成英文。从观察性医疗结果合作组织(OHDSI)的标准词汇表中提取药品概念,并以BioBERT生成的嵌入向量为基线,通过向量相似度评估翻译后的术语与RxNorm之间映射的准确性。随后,我们开发了带有RAG的大语言模型,该模型能从基线中区分出最终候选药物。通过将其与基于向量相似度的传统方法进行比较,我们评估了带有RAG的大语言模型在候选药物选择方面的效果。

结果

评估指标表明,大语言模型+RAG组合的性能优于传统的向量相似度方法。值得注意的是,Mixtral 8x7b和GPT-3.5模型的命中率超过90%,在口服药物、注射剂和所有干预措施的分层组中显著优于64%的基线率。此外,衡量模型判断与人工评估一致性的r精度指标显示,大语言模型的性能有显著提升,与23%的基线相比,提升幅度在41%至50%之间。

结论

整合RAG和大语言模型的表现优于传统的字符串比较和嵌入向量相似度技术,为全球药品信息映射提供了一种更精细的方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c592/11570653/80a198bd39df/hir-2024-30-4-355f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验