Cui Hejie, Mao Lingjun, Liang Xin, Zhang Jieyu, Ren Hui, Li Quanzheng, Li Xiang, Yang Carl
Stanford University.
Emory University.
Adv Neural Inf Process Syst. 2024 Dec;37:96449-96467.
Recent advancements in multimodal foundation models have showcased impressive capabilities in understanding and reasoning with visual and textual information. Adapting these foundation models trained for general usage to specialized domains like biomedicine requires large-scale domain-specific instruction datasets. While existing works have explored curating such datasets automatically, the resultant datasets are not explicitly aligned with domain expertise. In this work, we propose a data-centric framework, ical isual nstruction uning with linician Preference ignment (BioMed-VITAL), that incorporates clinician preferences into both stages of generating and selecting instruction data for tuning biomedical multimodal foundation models. First, during the generation stage, we prompt the GPT-4V generator with a diverse set of clinician-selected demonstrations for preference-aligned data candidate generation. Then, during the selection phase, we train a separate selection model, which explicitly distills clinician and policy-guided model preferences into a rating function to select high-quality data for medical instruction tuning. Results show that the model tuned with the instruction-following data from our method demonstrates a significant improvement in open visual chat (18.5% relatively) and medical VQA (win rate up to 81.73%). Our instruction-following data and models are available at https://BioMed-VITAL.github.io.
多模态基础模型的最新进展在理解和推理视觉与文本信息方面展现出了令人印象深刻的能力。将这些为通用用途训练的基础模型应用于生物医学等专业领域,需要大规模的特定领域指令数据集。虽然现有工作已经探索了自动整理此类数据集,但生成的数据集并未与领域专业知识明确对齐。在这项工作中,我们提出了一个以数据为中心的框架,即结合临床医生偏好分配的视觉指令训练(BioMed-VITAL),该框架在为调整生物医学多模态基础模型生成和选择指令数据的两个阶段都纳入了临床医生的偏好。首先,在生成阶段,我们用临床医生选择的各种示例提示GPT-4V生成器,以生成偏好对齐的数据候选。然后,在选择阶段,我们训练一个单独的选择模型,该模型将临床医生和政策引导的模型偏好明确提炼为一个评分函数,以选择高质量的数据用于医学指令调整。结果表明,使用我们方法的遵循指令的数据调整后的模型在开放式视觉聊天(相对提高18.5%)和医学视觉问答(胜率高达81.73%)方面有显著改进。我们的遵循指令的数据和模型可在https://BioMed-VITAL.github.io获取。