Yang Xiongwen, Zhang Yun, Jiang Jinyan, Chen Zhijun, Bai Rinasu, Yuan Zihao, Dong Longyan, Xiao Yi, Liu Di, Deng Huiyin, Huang Jian, Shi Huiyou, Liu Dan, Liang Maoli, Tang WeiJuan, Xu Chuan
Department of Thoracic Surgery, Guizhou Provincial People's Hospital, Guiyang, Guizhou, China.
NHC Key Laboratory of Pulmonary Immunological Diseases, Guizhou Provincial People's Hospital, Guiyang, Guizhou, China.
Digit Health. 2025 May 29;11:20552076251346703. doi: 10.1177/20552076251346703. eCollection 2025 Jan-Dec.
Accurate pathology reports are crucial for the diagnosis and treatment planning of cancer patients. However, these reports are prone to errors due to time pressures, subjective interpretation, and inconsistencies among professionals. Addressing these errors is vital for improving oncology care outcomes. Artificial intelligence (AI) systems, such as GPT-4, offer the potential to enhance diagnostic accuracy and efficiency.
A total of 700 malignant tumor pathology reports were collected from four hospitals. Of these, 350 reports had deliberate errors introduced by a senior pathologist, mimicking real-world reporting challenges. Error detection performance was evaluated by comparing GPT-4 to six human pathologists (two seniors, two attending pathologists, and two residents). Key metrics included error detection rates with Wilson confidence intervals and processing time per report.
GPT-4 detected 88% of errors (350/400; 95% CI: [84, 91]), compared to a 95% detection rate by the top senior pathologist (382/400; 95% CI: [93, 97]). GPT-4 significantly reduced the average processing time to 4.03 seconds per report, compared to 65.64 seconds for the fastest human pathologist. However, GPT-4 exhibited a higher rate of false positives (2.3%; 95% CI: [1.52, 3.01]) compared to the best-performing senior pathologist (0.3%; 95% CI: [0.01, 0.91]).
GPT-4 demonstrates substantial potential in improving the efficiency and accuracy of pathology error detection, which could accelerate clinical workflows and enhance cancer diagnostics. However, its higher false-positive rate emphasizes the need for human oversight to ensure safe implementation in clinical practice.
准确的病理报告对于癌症患者的诊断和治疗规划至关重要。然而,由于时间压力、主观解读以及专业人员之间的不一致性,这些报告容易出现错误。解决这些错误对于改善肿瘤护理结果至关重要。人工智能(AI)系统,如GPT-4,具有提高诊断准确性和效率的潜力。
从四家医院收集了总共700份恶性肿瘤病理报告。其中,350份报告由一位资深病理学家故意引入错误,以模拟现实世界中的报告挑战。通过将GPT-4与六位人类病理学家(两位资深病理学家、两位主治病理学家和两位住院医师)进行比较,评估错误检测性能。关键指标包括带有威尔逊置信区间的错误检测率和每份报告的处理时间。
GPT-4检测到88%的错误(350/400;95%置信区间:[84, 91]),而顶级资深病理学家的检测率为95%(382/400;95%置信区间:[93, 97])。与最快的人类病理学家每份报告65.64秒相比,GPT-4显著将平均处理时间减少到每份报告4.03秒。然而,与表现最佳的资深病理学家(0.3%;95%置信区间:[0.01, 0.91])相比,GPT-4的假阳性率更高(2.3%;95%置信区间:[1.52, 3.01])。
GPT-4在提高病理错误检测的效率和准确性方面显示出巨大潜力,这可以加速临床工作流程并增强癌症诊断。然而,其较高的假阳性率强调了需要人工监督以确保在临床实践中的安全实施。