Vo Hung Q, Wang Lin, Wong Kelvin K, Ezeana Chika F, Yu Xiaohui, Yang Wei, Chang Jenny, Nguyen Hien V, Wong Stephen T C
IEEE J Biomed Health Inform. 2025 May;29(5):3234-3246. doi: 10.1109/JBHI.2024.3507638. Epub 2025 May 6.
Breast cancer is a pervasive global health concern among women. Leveraging multimodal data from enterprise patient databases-including Picture Archiving and Communication Systems (PACS) and Electronic Health Records (EHRs)-holds promise for improving prediction. This study introduces a multimodal deep-learning model leveraging mammogram datasets to evaluate breast cancer prediction. Our approach integrates frozen large-scale pretrained vision-language models, showcasing superior performance and stability compared to traditional image-tabular models across two public breast cancer datasets. The model consistently outperforms conventional full fine-tuning methods by using frozen pretrained vision-language models alongside a lightweight trainable classifier. The observed improvements are significant. In the CBIS-DDSM dataset, the Area Under the Curve (AUC) increases from 0.867 to 0.902 during validation and from 0.803 to 0.830 for the official test set. Within the EMBED dataset, AUC improves from 0.780 to 0.805 during validation. In scenarios with limited data, using Breast Imaging-Reporting and Data System category three (BI-RADS 3) cases, AUC improves from 0.91 to 0.96 on the official CBIS-DDSM test set and from 0.79 to 0.83 on a challenging validation set. This study underscores the benefits of vision-language models in jointly training diverse image-clinical datasets from multiple healthcare institutions, effectively addressing challenges related to non-aligned tabular features. Combining training data enhances breast cancer prediction on the EMBED dataset, outperforming all other experiments. In summary, our research emphasizes the efficacy of frozen large-scale pretrained vision-language models in multimodal breast cancer prediction, offering superior performance and stability over conventional methods, reinforcing their potential for breast cancer prediction.
乳腺癌是全球女性普遍关注的健康问题。利用企业患者数据库中的多模态数据——包括图像存档与通信系统(PACS)和电子健康记录(EHR)——有望改善预测效果。本研究引入了一种利用乳房X光数据集的多模态深度学习模型来评估乳腺癌预测。我们的方法整合了冻结的大规模预训练视觉语言模型,与传统的图像表格模型相比,在两个公共乳腺癌数据集上展现出卓越的性能和稳定性。该模型通过将冻结的预训练视觉语言模型与轻量级可训练分类器一起使用,始终优于传统的完全微调方法。观察到的改进非常显著。在CBIS-DDSM数据集中,验证期间曲线下面积(AUC)从0.867增加到0.902,官方测试集从0.803增加到0.830。在EMBED数据集中,验证期间AUC从0.780提高到0.805。在数据有限的情况下,使用乳腺影像报告和数据系统3类(BI-RADS 3)病例,官方CBIS-DDSM测试集的AUC从0.91提高到0.96,在具有挑战性的验证集上从0.79提高到0.83。本研究强调了视觉语言模型在联合训练来自多个医疗机构的不同图像临床数据集方面的优势,有效解决了与未对齐表格特征相关的挑战。合并训练数据增强了EMBED数据集上的乳腺癌预测,优于所有其他实验。总之,我们的研究强调了冻结的大规模预训练视觉语言模型在多模态乳腺癌预测中的有效性,与传统方法相比具有卓越的性能和稳定性,增强了它们在乳腺癌预测中的潜力。