National Center for Tumor Diseases (NCT), Heidelberg University Hospital, Heidelberg, Germany.
Department of Medical Oncology, Heidelberg University Hospital, Heidelberg, Germany.
Nat Commun. 2024 Nov 21;15(1):10104. doi: 10.1038/s41467-024-51465-9.
Medical image classification requires labeled, task-specific datasets which are used to train deep learning networks de novo, or to fine-tune foundation models. However, this process is computationally and technically demanding. In language processing, in-context learning provides an alternative, where models learn from within prompts, bypassing the need for parameter updates. Yet, in-context learning remains underexplored in medical image analysis. Here, we systematically evaluate the model Generative Pretrained Transformer 4 with Vision capabilities (GPT-4V) on cancer image processing with in-context learning on three cancer histopathology tasks of high importance: Classification of tissue subtypes in colorectal cancer, colon polyp subtyping and breast tumor detection in lymph node sections. Our results show that in-context learning is sufficient to match or even outperform specialized neural networks trained for particular tasks, while only requiring a minimal number of samples. In summary, this study demonstrates that large vision language models trained on non-domain specific data can be applied out-of-the box to solve medical image-processing tasks in histopathology. This democratizes access of generalist AI models to medical experts without technical background especially for areas where annotated data is scarce.
医学图像分类需要使用标记的、特定于任务的数据集来从头训练深度学习网络,或者微调基础模型。然而,这个过程在计算和技术上都有很高的要求。在语言处理中,上下文学习提供了一种替代方法,模型可以从提示中学习,而无需更新参数。然而,上下文学习在医学图像分析中仍未得到充分探索。在这里,我们系统地评估了具有视觉能力的生成式预训练转换器 4(GPT-4V)模型,该模型在三个具有高度重要性的癌症组织病理学任务上进行了上下文学习:结直肠癌组织亚型分类、结肠息肉亚型分类和淋巴结切片中的乳腺肿瘤检测。我们的结果表明,上下文学习足以匹配甚至超越专门为特定任务训练的神经网络,而只需要少量的样本。总之,这项研究表明,在非特定领域数据上训练的大型视觉语言模型可以直接应用于解决组织病理学中的医学图像处理任务。这使得一般人工智能模型能够普及到没有技术背景的医学专家手中,特别是在注释数据稀缺的领域。