Kimura Eizen, Kawakami Yukinobu, Inoue Shingo, Okajima Ai
Department of Medical Informatics, Medical School of Ehime University, Toon, Ehime, Japan.
Yuimedi Inc., Tokyo, Japan.
Healthc Inform Res. 2024 Oct;30(4):355-363. doi: 10.4258/hir.2024.30.4.355. Epub 2024 Oct 31.
This study evaluated the efficacy of integrating a retrieval-augmented generation (RAG) model and a large language model (LLM) to improve the accuracy of drug name mapping across international vocabularies.
Drug ingredient names were translated into English using the Japanese Accepted Names for Pharmaceuticals. Drug concepts were extracted from the standard vocabulary of OHDSI, and the accuracy of mappings between translated terms and RxNorm was assessed by vector similarity, using the BioBERT-generated embedded vectors as the baseline. Subsequently, we developed LLMs with RAG that distinguished the final candidates from the baseline. We assessed the efficacy of the LLM with RAG in candidate selection by comparing it with conventional methods based on vector similarity.
The evaluation metrics demonstrated the superior performance of the combined LLM + RAG over traditional vector similarity methods. Notably, the hit rates of the Mixtral 8x7b and GPT-3.5 models exceeded 90%, significantly outperforming the baseline rate of 64% across stratified groups of PO drugs, injections, and all interventions. Furthermore, the r-precision metric, which measures the alignment between model judgment and human evaluation, revealed a notable improvement in LLM performance, ranging from 41% to 50% compared to the baseline of 23%.
Integrating an RAG and an LLM outperformed conventional string comparison and embedding vector similarity techniques, offering a more refined approach to global drug information mapping.
本研究评估了整合检索增强生成(RAG)模型和大语言模型(LLM)以提高跨国际词汇表的药品名称映射准确性的效果。
使用日本药品通用名称将药品成分名称翻译成英文。从观察性医疗结果合作组织(OHDSI)的标准词汇表中提取药品概念,并以BioBERT生成的嵌入向量为基线,通过向量相似度评估翻译后的术语与RxNorm之间映射的准确性。随后,我们开发了带有RAG的大语言模型,该模型能从基线中区分出最终候选药物。通过将其与基于向量相似度的传统方法进行比较,我们评估了带有RAG的大语言模型在候选药物选择方面的效果。
评估指标表明,大语言模型+RAG组合的性能优于传统的向量相似度方法。值得注意的是,Mixtral 8x7b和GPT-3.5模型的命中率超过90%,在口服药物、注射剂和所有干预措施的分层组中显著优于64%的基线率。此外,衡量模型判断与人工评估一致性的r精度指标显示,大语言模型的性能有显著提升,与23%的基线相比,提升幅度在41%至50%之间。
整合RAG和大语言模型的表现优于传统的字符串比较和嵌入向量相似度技术,为全球药品信息映射提供了一种更精细的方法。