Division of Pathology, Chulabhorn International College of Medicine, Thammasat University, Pathum Thani, Thailand; Division of Pathology, Thammasat University Hospital, Pathum Thani, Thailand.
Division of Pathology, Thammasat University Hospital, Pathum Thani, Thailand.
Ann Diagn Pathol. 2024 Dec;73:152359. doi: 10.1016/j.anndiagpath.2024.152359. Epub 2024 Jul 2.
This study aimed to evaluate and analyze the performance of a customized Chat Generative Pre-Trained Transformer (ChatGPT), known as GPT, against pathology residents in providing microscopic descriptions and diagnosing diseases from histopathological images. A dataset of representative photomicrographs from 70 diseases across 14 organ systems was analyzed by a customized version of ChatGPT-4 (GPT-4) and pathology residents. Two pathologists independently evaluated the microscopic descriptions and diagnoses using a predefined scoring system (0-4 for microscopic descriptions and 0-2 for pathological diagnoses), with higher scores indicating greater accuracy. Microscopic descriptions that received perfect scores, which included all relevant keywords and findings, were then presented to the standard version of ChatGPT to assess its diagnostic capabilities based on these descriptions. GPT-4 showed consistency in microscopic description and diagnosis scores across five rounds, accomplishing median scores of 50 % and 48.6 %, respectively. However, its performance was still inferior to junior and senior pathology residents (73.9 % and 93.9 % description scores and 63.9 % and 87.9 % diagnosis scores, respectively). When analyzing classic ChatGPT's understanding of microscopic descriptions provided by residents, it correctly diagnosed 35 (87.5 %) of cases from junior residents and 44 (68.8 %) from senior residents, given that the initial descriptions consisted of keywords and relevant findings. While GPT-4 can accurately interpret some histopathological images, its overall performance is currently inferior to that of pathology residents. However, ChatGPT's ability to accurately interpret and diagnose diseases from the descriptions provided by residents suggests that this technology could serve as a valuable support tool in pathology diagnostics.
本研究旨在评估和分析定制版 Chat Generative Pre-Trained Transformer(ChatGPT),即 GPT,在提供显微镜描述和诊断疾病方面的表现,与病理住院医师相比,GPT 可从组织病理学图像中进行诊断。通过定制版 ChatGPT-4(GPT-4)和病理住院医师对来自 14 个器官系统的 70 种疾病的代表性显微照片数据集进行了分析。两位病理学家使用预定义的评分系统(显微镜描述评分为 0-4 分,病理诊断评分为 0-2 分)独立评估了显微镜描述和诊断,评分越高表示准确性越高。收到满分(包括所有相关关键词和发现)的显微镜描述将被呈现给标准版 ChatGPT,以评估其基于这些描述的诊断能力。GPT-4 在五轮评估中保持了一致性,分别获得了 50%和 48.6%的中位数评分。然而,其性能仍不如初级和高级病理住院医师(描述评分分别为 73.9%和 93.9%,诊断评分分别为 63.9%和 87.9%)。在分析经典版 ChatGPT 对住院医师提供的显微镜描述的理解能力时,它正确诊断了 35 名初级住院医师(87.5%)和 44 名高级住院医师(68.8%)的病例,前提是初始描述包含了关键词和相关发现。虽然 GPT-4 可以准确解释一些组织病理学图像,但它的整体性能目前仍逊于病理住院医师。然而,ChatGPT 从住院医师提供的描述中准确诊断疾病的能力表明,这项技术可以成为病理诊断的有价值的辅助工具。