Ding Liya, Fan Lei, Shen Miao, Wang Yawen, Sheng Kaiqin, Zou Zijuan, An Huimin, Jiang Zhinong
Department of Pathology, Sir Run Run Shaw Hospital, Zhejiang University School of Medicine, Hangzhou, China.
Department of Pathology, Ninghai County Traditional Chinese Medicine Hospital, Ningbo, China.
Front Med (Lausanne). 2025 Jan 23;11:1507203. doi: 10.3389/fmed.2024.1507203. eCollection 2024.
Chat Generative Pretrained Transformer (ChatGPT) is a type of large language model (LLM) developed by OpenAI, known for its extensive knowledge base and interactive capabilities. These attributes make it a valuable tool in the medical field, particularly for tasks such as answering medical questions, drafting clinical notes, and optimizing the generation of radiology reports. However, keeping accuracy in medical contexts is the biggest challenge to employing GPT-4 in a clinical setting. This study aims to investigate the accuracy of GPT-4, which can process both text and image inputs, in generating diagnoses from pathological images.
This study analyzed 44 histopathological images from 16 organs and 100 colorectal biopsy photomicrographs. The initial evaluation was conducted using the standard GPT-4 model in January 2024, with a subsequent re-evaluation performed in July 2024. The diagnostic accuracy of GPT-4 was assessed by comparing its outputs to a reference standard using statistical measures. Additionally, four pathologists independently reviewed the same images to compare their diagnoses with the model's outputs. Both scanned and photographed images were tested to evaluate GPT-4's generalization ability across different image types.
GPT-4 achieved an overall accuracy of 0.64 in identifying tumor imaging and tissue origins. For colon polyp classification, accuracy varied from 0.57 to 0.75 in different subtypes. The model achieved 0.88 accuracy in distinguishing low-grade from high-grade dysplasia and 0.75 in distinguishing high-grade dysplasia from adenocarcinoma, with a high sensitivity in detecting adenocarcinoma. Consistency between initial and follow-up evaluations showed slight to moderate agreement, with Kappa values ranging from 0.204 to 0.375.
GPT-4 demonstrates the ability to diagnose pathological images, showing improved performance over earlier versions. Its diagnostic accuracy in cancer is comparable to that of pathology residents. These findings suggest that GPT-4 holds promise as a supportive tool in pathology diagnostics, offering the potential to assist pathologists in routine diagnostic workflows.
聊天生成预训练变换器(ChatGPT)是OpenAI开发的一种大型语言模型(LLM),以其广泛的知识库和交互能力而闻名。这些特性使其成为医学领域的一个有价值的工具,特别是在回答医学问题、起草临床记录和优化放射学报告生成等任务中。然而,在医学环境中保持准确性是在临床环境中使用GPT-4的最大挑战。本研究旨在调查能够处理文本和图像输入的GPT-4从病理图像生成诊断的准确性。
本研究分析了来自16个器官的44张组织病理学图像和100张结肠活检显微照片。最初的评估于2024年1月使用标准GPT-4模型进行,随后于2024年7月进行了重新评估。通过使用统计方法将GPT-4的输出与参考标准进行比较,评估其诊断准确性。此外,四位病理学家独立审查相同的图像,以将他们的诊断与模型的输出进行比较。对扫描图像和拍摄图像都进行了测试,以评估GPT-4在不同图像类型上的泛化能力。
GPT-4在识别肿瘤成像和组织来源方面的总体准确率为0.64。对于结肠息肉分类,不同亚型的准确率在0.57至0.75之间。该模型在区分低级别与高级别发育异常方面的准确率为0.88,在区分高级别发育异常与腺癌方面的准确率为0.75,在检测腺癌方面具有较高的敏感性。初始评估和后续评估之间的一致性显示出轻微到中等程度的一致性,Kappa值范围为0.204至0.375。
GPT-4展示了诊断病理图像的能力,表现出比早期版本更好的性能。其在癌症诊断方面的准确性与病理住院医师相当。这些发现表明,GPT-4有望成为病理诊断中的一种辅助工具,有可能在常规诊断工作流程中协助病理学家。