Department of Computer Science and Engineering, Lehigh University, Bethlehem, PA, USA.
School of Computing, University of Georgia, Athens, GA, USA.
Nat Med. 2024 Nov;30(11):3129-3141. doi: 10.1038/s41591-024-03185-2. Epub 2024 Aug 7.
Traditional biomedical artificial intelligence (AI) models, designed for specific tasks or modalities, often exhibit limited flexibility in real-world deployment and struggle to utilize holistic information. Generalist AI holds the potential to address these limitations due to its versatility in interpreting different data types and generating tailored outputs for diverse needs. However, existing biomedical generalist AI solutions are typically heavyweight and closed source to researchers, practitioners and patients. Here, we describe BiomedGPT, the first open-source and lightweight vision-language foundation model, designed as a generalist capable of performing various biomedical tasks. BiomedGPT achieved state-of-the-art results in 16 out of 25 experiments while maintaining a computing-friendly model scale. We also conducted human evaluations to assess the capabilities of BiomedGPT in radiology visual question answering, report generation and summarization. BiomedGPT exhibits robust prediction ability with a low error rate of 3.8% in question answering, satisfactory performance with an error rate of 8.3% in writing complex radiology reports, and competitive summarization ability with a nearly equivalent preference score to human experts. Our method demonstrates that effective training with diverse data can lead to more practical biomedical AI for improving diagnosis and workflow efficiency.
传统的生物医学人工智能 (AI) 模型,专为特定任务或模态设计,在实际部署中往往缺乏灵活性,难以利用整体信息。由于通用 AI 能够解释不同类型的数据并为各种需求生成定制化的输出,因此具有解决这些限制的潜力。然而,现有的生物医学通用 AI 解决方案通常对研究人员、从业者和患者来说过于复杂且是闭源的。在这里,我们描述了 BiomedGPT,这是第一个开源的轻量级视觉语言基础模型,旨在成为一个能够执行各种生物医学任务的通用 AI。BiomedGPT 在 25 项实验中的 16 项中取得了最先进的结果,同时保持了计算友好的模型规模。我们还进行了人类评估,以评估 BiomedGPT 在放射科视觉问答、报告生成和总结方面的能力。BiomedGPT 在问答中表现出强大的预测能力,错误率为 3.8%;在撰写复杂的放射科报告方面表现出令人满意的性能,错误率为 8.3%;在总结方面具有竞争力,偏好得分与人类专家几乎相当。我们的方法表明,通过多样化的数据进行有效的训练可以为提高诊断和工作流程效率提供更实用的生物医学 AI。