通过偏好优化将多模态集成知识转移到具有生物医学应用的大语言模型

Multimodal Integrated Knowledge Transfer to Large Language Models through Preference Optimization with Biomedical Applications.

作者信息

Wu Da, Wang Zhanliang, Nguyen Quan, Xu Zhuoran, Wang Kai

机构信息

Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA.

Applied Mathematics and Computational Science Graduate Program, University of Pennsylvania, Philadelphia, PA, 19104, USA.

出版信息

ArXiv. 2025 May 9:arXiv:2505.05736v1.

PMID:40386570

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12083703/

Abstract

The scarcity of high-quality multimodal biomedical data limits the ability to effectively fine-tune pretrained Large Language Models (LLMs) for specialized biomedical tasks. To address this challenge, we introduce MINT (Multimodal Integrated kNowledge Transfer), a framework that aligns unimodal large decoder models with domain-specific decision patterns from high-quality multimodal biomedical data through preference optimization. While MINT supports different optimization techniques, we primarily implement it with the Odds Ratio Preference Optimization (ORPO) framework as its backbone. This strategy enables the aligned LLMs to perform predictive tasks using text-only or image-only inputs while retaining knowledge learnt from multimodal data. MINT leverages an upstream multimodal machine learning (MML) model trained on high-quality multimodal data to transfer domain-specific insights to downstream text-only or image-only LLMs. We demonstrate MINT's effectiveness through two key applications: (1) Rare genetic disease prediction from texts, where MINT uses a multimodal encoder model, trained on facial photos and clinical notes, to generate a preference dataset for aligning a lightweight decoder-based text-only LLM (Llama 3.2-3B-Instruct). Despite relying on text input only, the MINT-derived model outperforms models trained with Supervised Fine-Tuning (SFT), Retrieval-Augmented Generation (RAG), or direct preference optimization (DPO), and even outperforms much larger foundation model (Llama 3.1-405B-Instruct). (2) Tissue type classification using cell nucleus images, where MINT uses a vision-language foundation model as the preference generator, containing knowledge learnt from both text and histopathological images to align downstream image-only models. The resulting MINT-derived model significantly improves the performance of Llama 3.2-Vision-11B-Instruct on tissue type classification. In summary, MINT provides an effective strategy to align unimodal LLMs with high-quality multimodal expertise through preference optimization. Our study also highlights a hybrid strategy that grafts the strength of encoder models in classification tasks into large decoder models to enhance reasoning, improve predictive tasks and reduce hallucination in biomedical applications.

摘要

高质量多模态生物医学数据的稀缺限制了针对专门生物医学任务有效微调预训练大语言模型（LLM）的能力。为应对这一挑战，我们引入了MINT（多模态集成知识转移），这是一个通过偏好优化将单模态大型解码器模型与来自高质量多模态生物医学数据的特定领域决策模式对齐的框架。虽然MINT支持不同的优化技术，但我们主要以优势比偏好优化（ORPO）框架为骨干来实现它。这种策略使对齐后的LLM能够使用仅文本或仅图像输入执行预测任务，同时保留从多模态数据中学到的知识。MINT利用在高质量多模态数据上训练的上游多模态机器学习（MML）模型，将特定领域的见解转移到下游仅文本或仅图像的LLM。我们通过两个关键应用展示了MINT的有效性：（1）从文本中预测罕见遗传疾病，其中MINT使用在面部照片和临床记录上训练的多模态编码器模型，生成一个偏好数据集，用于对齐基于轻量级解码器的仅文本LLM（Llama 3.2 - 3B - Instruct）。尽管仅依赖文本输入，但MINT衍生的模型优于使用监督微调（SFT）、检索增强生成（RAG）或直接偏好优化（DPO）训练的模型，甚至优于大得多的基础模型（Llama 3.1 - 405B - Instruct）。（2）使用细胞核图像进行组织类型分类，其中MINT使用视觉语言基础模型作为偏好生成器，该模型包含从文本和组织病理学图像中学到的知识，以对齐下游仅图像模型。由此产生的MINT衍生模型显著提高了Llama 3.2 - Vision - 11B - Instruct在组织类型分类上的性能。总之，MINT提供了一种通过偏好优化将单模态LLM与高质量多模态专业知识对齐的有效策略。我们的研究还突出了一种混合策略，即将分类任务中编码器模型的优势嫁接到大型解码器模型中，以增强生物医学应用中的推理、改善预测任务并减少幻觉。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/32c9/12083703/1e845a385a4a/nihpp-2505.05736v1-f0001.jpg

相似文献

Multimodal Integrated Knowledge Transfer to Large Language Models through Preference Optimization with Biomedical Applications.

ArXiv. 2025 May 9:arXiv:2505.05736v1.

Evaluating the effectiveness of biomedical fine-tuning for large language models on clinical tasks.

J Am Med Inform Assoc. 2025 Jun 1;32(6):1015-1024. doi: 10.1093/jamia/ocaf045.

Advancing Korean Medical Large Language Models: Automated Pipeline for Korean Medical Preference Dataset Construction.

Healthc Inform Res. 2025 Apr;31(2):166-174. doi: 10.4258/hir.2025.31.2.166. Epub 2025 Apr 30.

A dataset and benchmark for hospital course summarization with adapted large language models.

J Am Med Inform Assoc. 2025 Mar 1;32(3):470-479. doi: 10.1093/jamia/ocae312.

nach0: multimodal natural and chemical languages foundation model.

Chem Sci. 2024 May 8;15(22):8380-8389. doi: 10.1039/d4sc00966e. eCollection 2024 Jun 5.

Biomedical knowledge graph-optimized prompt generation for large language models.

Bioinformatics. 2024 Sep 2;40(9). doi: 10.1093/bioinformatics/btae560.

Semantic Clinical Artificial Intelligence vs Native Large Language Model Performance on the USMLE.

JAMA Netw Open. 2025 Apr 1;8(4):e256359. doi: 10.1001/jamanetworkopen.2025.6359.

Advancing entity recognition in biomedicine via instruction tuning of large language models.

Bioinformatics. 2024 Mar 29;40(4). doi: 10.1093/bioinformatics/btae163.

Evaluating large language models for health-related text classification tasks with public social media data.

J Am Med Inform Assoc. 2024 Oct 1;31(10):2181-2189. doi: 10.1093/jamia/ocae210.

Optimizing large language models in digestive disease: strategies and challenges to improve clinical outcomes.

Liver Int. 2024 Sep;44(9):2114-2124. doi: 10.1111/liv.15974. Epub 2024 May 31.

本文引用的文献

Exploring the reversal curse and other deductive logical reasoning in BERT and GPT-based large language models.

Patterns (N Y). 2024 Jul 25;5(9):101030. doi: 10.1016/j.patter.2024.101030. eCollection 2024 Sep 13.

Assessing the utility of large language models for phenotype-driven gene prioritization in the diagnosis of rare genetic disease.

Am J Hum Genet. 2024 Oct 3;111(10):2190-2202. doi: 10.1016/j.ajhg.2024.08.010. Epub 2024 Sep 9.

Efficient healthcare with large language models: optimizing clinical workflow and enhancing patient care.

J Am Med Inform Assoc. 2024 May 20;31(6):1436-1440. doi: 10.1093/jamia/ocad258.

Enhancing phenotype recognition in clinical notes using large language models: PhenoBCBERT and PhenoGPT.

Patterns (N Y). 2023 Dec 5;5(1):100887. doi: 10.1016/j.patter.2023.100887. eCollection 2024 Jan 12.

A visual-language foundation model for pathology image analysis using medical Twitter.

Nat Med. 2023 Sep;29(9):2307-2316. doi: 10.1038/s41591-023-02504-3. Epub 2023 Aug 17.

The diagnostic odyssey: insights from parents of children living with an undiagnosed condition.

Orphanet J Rare Dis. 2022 Jun 18;17(1):233. doi: 10.1186/s13023-022-02358-x.

Phenotype-driven approaches to enhance variant prioritization and diagnosis of rare disease.

Hum Mutat. 2022 Aug;43(8):1071-1081. doi: 10.1002/humu.24380. Epub 2022 Apr 27.

GestaltMatcher facilitates rare disease matching using facial phenotype descriptors.

Nat Genet. 2022 Mar;54(3):349-357. doi: 10.1038/s41588-021-01010-x. Epub 2022 Feb 10.

Extractive summarization of clinical trial descriptions.

Int J Med Inform. 2019 Sep;129:114-121. doi: 10.1016/j.ijmedinf.2019.05.019. Epub 2019 May 30.

The burden of rare diseases.

Am J Med Genet A. 2019 Jun;179(6):885-892. doi: 10.1002/ajmg.a.61124. Epub 2019 Mar 18.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

通过偏好优化将多模态集成知识转移到具有生物医学应用的大语言模型

Multimodal Integrated Knowledge Transfer to Large Language Models through Preference Optimization with Biomedical Applications.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

本文引用的文献