• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

推进韩国医学大语言模型:韩国医学偏好数据集构建的自动化管道

Advancing Korean Medical Large Language Models: Automated Pipeline for Korean Medical Preference Dataset Construction.

作者信息

Seo Jean, Park Sumin, Byun Sungjoo, Choi Jinwook, Choi Jinho, Shin Hyopil

机构信息

Department of Linguistics, Seoul National University, Seoul, Korea.

College of Humanities, Seoul National University, Seoul, Korea.

出版信息

Healthc Inform Res. 2025 Apr;31(2):166-174. doi: 10.4258/hir.2025.31.2.166. Epub 2025 Apr 30.

DOI:10.4258/hir.2025.31.2.166
PMID:40384068
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12086433/
Abstract

OBJECTIVES

Developing large language models (LLMs) in biomedicine requires access to high-quality training and alignment tuning datasets. However, publicly available Korean medical preference datasets are scarce, hindering the advancement of Korean medical LLMs. This study constructs and evaluates the efficacy of the Korean Medical Preference Dataset (KoMeP), an alignment tuning dataset constructed with an automated pipeline, minimizing the high costs of human annotation.

METHODS

KoMeP was generated using the DAHL score, an automated hallucination evaluation metric. Five LLMs (Dolly-v2-3B, MPT-7B, GPT-4o, Qwen-2-7B, Llama-3-8B) produced responses to 8,573 biomedical examination questions, from which 5,551 preference pairs were extracted. Each pair consisted of a "chosen" response and a "rejected" response, as determined by their DAHL scores. The dataset was evaluated when trained through two different alignment tuning methods, direct preference optimization (DPO) and odds ratio preference optimization (ORPO) respectively across five different models. The KorMedMCQA benchmark was employed to assess the effectiveness of alignment tuning.

RESULTS

Models trained with DPO consistently improved KorMedMCQA performance; notably, Llama-3.1-8B showed a 43.96% increase. In contrast, ORPO training produced inconsistent results. Additionally, English-to-Korean transfer learning proved effective, particularly for English-centric models like Gemma-2, whereas Korean-to-English transfer learning achieved limited success. Instruction tuning with KoMeP yielded mixed outcomes, which suggests challenges in dataset formatting.

CONCLUSIONS

KoMeP is the first publicly available Korean medical preference dataset and significantly improves alignment tuning performance in LLMs. The DPO method outperforms ORPO in alignment tuning. Future work should focus on expanding KoMeP, developing a Korean-native dataset, and refining alignment tuning methods to produce safer and more reliable Korean medical LLMs.

摘要

目标

在生物医学领域开发大语言模型(LLMs)需要高质量的训练和对齐微调数据集。然而,公开可用的韩语医学偏好数据集稀缺,阻碍了韩语医学大语言模型的发展。本研究构建并评估了韩语医学偏好数据集(KoMeP)的有效性,这是一个使用自动化管道构建的对齐微调数据集,将人工标注的高成本降至最低。

方法

KoMeP是使用DAHL分数生成的,这是一种自动幻觉评估指标。五个大语言模型(Dolly-v2-3B、MPT-7B、GPT-4o、Qwen-2-7B、Llama-3-8B)对8573个生物医学考试问题给出了回答,从中提取了5551个偏好对。每对由一个“选择的”回答和一个“拒绝的”回答组成,这是由它们的DAHL分数决定的。当通过两种不同的对齐微调方法(直接偏好优化(DPO)和优势比偏好优化(ORPO))分别在五个不同模型上进行训练时,对该数据集进行了评估。使用KorMedMCQA基准来评估对齐微调的有效性。

结果

使用DPO训练的模型始终提高了KorMedMCQA的性能;值得注意的是,Llama-3.1-8B显示出43.96%的增长。相比之下,ORPO训练产生了不一致的结果。此外,英语到韩语的迁移学习被证明是有效的,特别是对于像Gemma-2这样以英语为中心的模型,而韩语到英语的迁移学习取得的成功有限。使用KoMeP进行指令微调产生了好坏参半的结果,这表明在数据集格式化方面存在挑战。

结论

KoMeP是第一个公开可用的韩语医学偏好数据集,并显著提高了大语言模型中的对齐微调性能。在对齐微调方面,DPO方法优于ORPO。未来的工作应集中在扩展KoMeP、开发韩语原生数据集以及改进对齐微调方法,以生产更安全、更可靠的韩语医学大语言模型。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7a96/12086433/21a96e150cfc/hir-2025-31-2-166f8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7a96/12086433/4221bb4cdb16/hir-2025-31-2-166f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7a96/12086433/67f69cc5e099/hir-2025-31-2-166f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7a96/12086433/63b5b3b5b75e/hir-2025-31-2-166f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7a96/12086433/ba3319ddf12f/hir-2025-31-2-166f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7a96/12086433/ef6c81a1df04/hir-2025-31-2-166f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7a96/12086433/d04039359d95/hir-2025-31-2-166f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7a96/12086433/dc0aba9d5496/hir-2025-31-2-166f7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7a96/12086433/21a96e150cfc/hir-2025-31-2-166f8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7a96/12086433/4221bb4cdb16/hir-2025-31-2-166f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7a96/12086433/67f69cc5e099/hir-2025-31-2-166f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7a96/12086433/63b5b3b5b75e/hir-2025-31-2-166f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7a96/12086433/ba3319ddf12f/hir-2025-31-2-166f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7a96/12086433/ef6c81a1df04/hir-2025-31-2-166f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7a96/12086433/d04039359d95/hir-2025-31-2-166f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7a96/12086433/dc0aba9d5496/hir-2025-31-2-166f7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7a96/12086433/21a96e150cfc/hir-2025-31-2-166f8.jpg

相似文献

1
Advancing Korean Medical Large Language Models: Automated Pipeline for Korean Medical Preference Dataset Construction.推进韩国医学大语言模型:韩国医学偏好数据集构建的自动化管道
Healthc Inform Res. 2025 Apr;31(2):166-174. doi: 10.4258/hir.2025.31.2.166. Epub 2025 Apr 30.
2
Privacy-ensuring Open-weights Large Language Models Are Competitive with Closed-weights GPT-4o in Extracting Chest Radiography Findings from Free-Text Reports.在从自由文本报告中提取胸部X光检查结果方面,确保隐私的开放权重大型语言模型与封闭权重的GPT-4o具有竞争力。
Radiology. 2025 Jan;314(1):e240895. doi: 10.1148/radiol.240895.
3
BioInstruct: instruction tuning of large language models for biomedical natural language processing.BioInstruct:用于生物医学自然语言处理的大型语言模型的指令调整。
J Am Med Inform Assoc. 2024 Sep 1;31(9):1821-1832. doi: 10.1093/jamia/ocae122.
4
Evaluating and Enhancing Japanese Large Language Models for Genetic Counseling Support: Comparative Study of Domain Adaptation and the Development of an Expert-Evaluated Dataset.评估和增强用于遗传咨询支持的日本大语言模型:领域适应的比较研究与专家评估数据集的开发
JMIR Med Inform. 2025 Jan 16;13:e65047. doi: 10.2196/65047.
5
PH-LLM: Public Health Large Language Models for Infoveillance.PH-LLM:用于信息监测的公共卫生大语言模型
medRxiv. 2025 Feb 10:2025.02.08.25321587. doi: 10.1101/2025.02.08.25321587.
6
Evaluating the effectiveness of biomedical fine-tuning for large language models on clinical tasks.评估生物医学微调对大语言模型在临床任务上的有效性。
J Am Med Inform Assoc. 2025 Jun 1;32(6):1015-1024. doi: 10.1093/jamia/ocaf045.
7
Advancing entity recognition in biomedicine via instruction tuning of large language models.通过指令调整大型语言模型推进生物医学中的实体识别。
Bioinformatics. 2024 Mar 29;40(4). doi: 10.1093/bioinformatics/btae163.
8
Multimodal Integrated Knowledge Transfer to Large Language Models through Preference Optimization with Biomedical Applications.通过偏好优化将多模态集成知识转移到具有生物医学应用的大语言模型
ArXiv. 2025 May 9:arXiv:2505.05736v1.
9
A dataset and benchmark for hospital course summarization with adapted large language models.一个用于医院病程总结的数据集和基准测试,采用了适配的大语言模型。
J Am Med Inform Assoc. 2025 Mar 1;32(3):470-479. doi: 10.1093/jamia/ocae312.
10
Me-LLaMA: Medical Foundation Large Language Models for Comprehensive Text Analysis and Beyond.Me-LLaMA:用于综合文本分析及其他用途的医学基础大语言模型
Res Sq. 2024 Dec 18:rs.3.rs-5456223. doi: 10.21203/rs.3.rs-5456223/v1.

本文引用的文献

1
Large language models in biomedicine and health: current research landscape and future directions.生物医学与健康领域的大语言模型:当前研究现状与未来方向
J Am Med Inform Assoc. 2024 Sep 1;31(9):1801-1811. doi: 10.1093/jamia/ocae202.