Hu Yanni, Hu Ziyang, Liu Wenjing, Gao Antian, Wen Shanhui, Liu Shu, Lin Zitong
Department of Dentomaxillofacial Radiology, Nanjing Stomatological Hospital, Affiliated Hospital of Medical School, Institute of Stomatology, Nanjing University, Nanjing, Jiangsu, People's Republic of China.
Department of Stomatology, Shenzhen Longhua District Central Hospital, Shenzhen, People's Republic of China.
BMC Med Inform Decis Mak. 2024 Feb 19;24(1):55. doi: 10.1186/s12911-024-02445-y.
This study aimed to assess the performance of OpenAI's ChatGPT in generating diagnosis based on chief complaint and cone beam computed tomography (CBCT) radiologic findings.
102 CBCT reports (48 with dental diseases (DD) and 54 with neoplastic/cystic diseases (N/CD)) were collected. ChatGPT was provided with chief complaint and CBCT radiologic findings. Diagnostic outputs from ChatGPT were scored based on five-point Likert scale. For diagnosis accuracy, the scoring was based on the accuracy of chief complaint related diagnosis and chief complaint unrelated diagnoses (1-5 points); for diagnosis completeness, the scoring was based on how many accurate diagnoses included in ChatGPT's output for one case (1-5 points); for text quality, the scoring was based on how many text errors included in ChatGPT's output for one case (1-5 points). For 54 N/CD cases, the consistence of the diagnosis generated by ChatGPT with pathological diagnosis was also calculated. The constitution of text errors in ChatGPT's outputs was evaluated.
After subjective ratings by expert reviewers on a five-point Likert scale, the final score of diagnosis accuracy, diagnosis completeness and text quality of ChatGPT was 3.7, 4.5 and 4.6 for the 102 cases. For diagnostic accuracy, it performed significantly better on N/CD (3.8/5) compared to DD (3.6/5). For 54 N/CD cases, 21(38.9%) cases have first diagnosis completely consistent with pathological diagnosis. No text errors were observed in 88.7% of all the 390 text items.
ChatGPT showed potential in generating radiographic diagnosis based on chief complaint and radiologic findings. However, the performance of ChatGPT varied with task complexity, necessitating professional oversight due to a certain error rate.
本研究旨在评估OpenAI的ChatGPT基于主诉和锥形束计算机断层扫描(CBCT)影像学表现生成诊断的性能。
收集了102份CBCT报告(48份患有牙科疾病(DD),54份患有肿瘤/囊性疾病(N/CD))。向ChatGPT提供主诉和CBCT影像学表现。ChatGPT的诊断输出基于五点李克特量表进行评分。对于诊断准确性,评分基于与主诉相关诊断和与主诉无关诊断的准确性(1 - 5分);对于诊断完整性,评分基于ChatGPT针对一个病例的输出中包含多少准确诊断(1 - 5分);对于文本质量,评分基于ChatGPT针对一个病例的输出中包含多少文本错误(1 - 5分)。对于54例N/CD病例,还计算了ChatGPT生成的诊断与病理诊断的一致性。评估了ChatGPT输出中文本错误的构成。
经过专家评审员基于五点李克特量表的主观评分,对于这102例病例,ChatGPT的诊断准确性、诊断完整性和文本质量的最终得分分别为3.7、4.5和4.6。对于诊断准确性,与DD(3.6/5)相比,它在N/CD方面的表现明显更好(3.8/5)。对于54例N/CD病例,21例(38.9%)的首次诊断与病理诊断完全一致。在所有390个文本项中,88.7%未观察到文本错误。
ChatGPT在基于主诉和影像学表现生成放射学诊断方面显示出潜力。然而,ChatGPT的性能因任务复杂性而异,由于存在一定的错误率,需要专业监督。