Department of Neuroscience, Mayo Clinic, Jacksonville, Florida, USA.
Department of Pathology and Laboratory Medicine, Hospital of the University of Pennsylvania, Philadelphia, Pennsylvania, USA.
Neuropathol Appl Neurobiol. 2024 Aug;50(4):e12997. doi: 10.1111/nan.12997.
Recent advances in artificial intelligence, particularly with large language models like GPT-4Vision (GPT-4V)-a derivative feature of ChatGPT-have expanded the potential for medical image interpretation. This study evaluates the accuracy of GPT-4V in image classification tasks of histopathological images and compares its performance with a traditional convolutional neural network (CNN).
We utilised 1520 images, including haematoxylin and eosin staining and tau immunohistochemistry, from patients with various neurodegenerative diseases, such as Alzheimer's disease (AD), progressive supranuclear palsy (PSP) and corticobasal degeneration (CBD). We assessed GPT-4V's performance using multi-step prompts to determine how textual context influences image interpretation. We also employed few-shot learning to enhance improvements in GPT-4V's diagnostic performance in classifying three specific tau lesions-astrocytic plaques, neuritic plaques and tufted astrocytes-and compared the outcomes with the CNN model YOLOv8.
GPT-4V accurately recognised staining techniques and tissue origin but struggled with specific lesion identification. The interpretation of images was notably influenced by the provided textual context, which sometimes led to diagnostic inaccuracies. For instance, when presented with images of the motor cortex, the diagnosis shifted inappropriately from AD to CBD or PSP. However, few-shot learning markedly improved GPT-4V's diagnostic capabilities, enhancing accuracy from 40% in zero-shot learning to 90% with 20-shot learning, matching the performance of YOLOv8, which required 100-shot learning to achieve the same accuracy.
Although GPT-4V faces challenges in independently interpreting histopathological images, few-shot learning significantly improves its performance. This approach is especially promising for neuropathology, where acquiring extensive labelled datasets is often challenging.
最近人工智能领域的进展,尤其是像 GPT-4Vision(ChatGPT 的一个衍生功能)这样的大型语言模型的出现,拓宽了医学图像解释的潜力。本研究评估了 GPT-4V 在组织病理学图像分类任务中的准确性,并将其性能与传统的卷积神经网络(CNN)进行了比较。
我们使用了 1520 张包括苏木精和伊红染色和 tau 免疫组化的图像,这些图像来自患有各种神经退行性疾病的患者,如阿尔茨海默病(AD)、进行性核上性麻痹(PSP)和皮质基底节变性(CBD)。我们使用多步提示来评估 GPT-4V 的性能,以确定文本上下文如何影响图像解释。我们还采用了少样本学习来提高 GPT-4V 在分类三种特定 tau 病变——星形胶质斑块、神经原纤维斑块和丛状星形胶质细胞——中的诊断性能,并将结果与 CNN 模型 YOLOv8 进行了比较。
GPT-4V 准确识别了染色技术和组织来源,但在特定病变识别方面存在困难。图像的解释受到提供的文本上下文的显著影响,有时导致诊断不准确。例如,当呈现运动皮层的图像时,诊断不当从 AD 转移到 CBD 或 PSP。然而,少样本学习显著提高了 GPT-4V 的诊断能力,将零样本学习的准确率从 40%提高到 20 样本学习的 90%,与需要 100 样本学习才能达到相同准确率的 YOLOv8 相匹配。
尽管 GPT-4V 在独立解释组织病理学图像方面面临挑战,但少样本学习显著提高了其性能。这种方法在神经病理学中特别有前景,因为在神经病理学中获取广泛的标记数据集通常具有挑战性。