Lotfian Golnaz, Parekh Keyur, Abdul Sami Mohammed, Suthar Pokhraj P
Department of Diagnostic Radiology and Nuclear Medicine, Rush University Medical Center, Chicago, USA.
Cureus. 2024 Nov 15;16(11):e73741. doi: 10.7759/cureus.73741. eCollection 2024 Nov.
Recent advancements in natural language processing (NLP) have profoundly transformed the medical industry, enhancing large cohort data analysis, improving diagnostic capabilities, and streamlining clinical workflows. Among the leading tools in this domain is ChatGPT 4.0 (OpenAI, San Francisco, California, US), a commercial NLP model widely used across various applications. This study evaluates the diagnostic performance of ChatGPT 4.0 specifically in thoracic imaging by assessing its ability to answer diagnostic questions related to this field. We utilized the model to respond to multiple-choice questions derived from thoracic imaging scenarios, followed by rigorous statistical analysis to assess its accuracy and variability across different subgroups. Our analysis revealed significant variability across different subgroups. Overall, the model achieved an impressive accuracy of 84.9% in diagnosing thoracic radiology questions. It excelled in terminology and diagnostic signs, achieving perfect scores, and demonstrated strong performance in the intensive care and normal anatomy categories, with accuracies of 90% and 80%, respectively. In pathology subgroups, ChatGPT achieved an average accuracy of 89.1%, particularly excelling in diagnosing infectious pneumonia and atelectasis, though it scored lower in diffuse alveolar disease (66.7%). For disease-related questions, the mean accuracy was 79.1%, with perfect scores in several specific subcategories. However, accuracy was notably lower for vascular disease (50%) and lung cancer (66.7%). In conclusion, while ChatGPT 4.0 shows strong potential in diagnosing thoracic conditions, the variability identified underscores the necessity for ongoing research and refinement of its transformer architecture. This will enhance its reliability and applicability in broader clinical and patient care settings.
自然语言处理(NLP)的最新进展深刻改变了医疗行业,提升了大规模队列数据分析能力,改善了诊断能力,并简化了临床工作流程。该领域的领先工具之一是ChatGPT 4.0(美国加利福尼亚州旧金山的OpenAI),这是一种广泛应用于各种场景的商业NLP模型。本研究通过评估ChatGPT 4.0回答与胸部影像相关诊断问题的能力,具体评估其在胸部影像诊断中的性能。我们利用该模型回答源自胸部影像场景的多项选择题,随后进行严格的统计分析,以评估其在不同亚组中的准确性和变异性。我们的分析揭示了不同亚组之间存在显著的变异性。总体而言,该模型在诊断胸部放射学问题方面达到了令人印象深刻的84.9%的准确率。它在术语和诊断体征方面表现出色,获得了满分,并且在重症监护和正常解剖类别中表现强劲,准确率分别为90%和80%。在病理学亚组中,ChatGPT的平均准确率为89.1%,在诊断感染性肺炎和肺不张方面表现尤为突出,不过在弥漫性肺泡疾病方面得分较低(66.7%)。对于与疾病相关的问题,平均准确率为79.1%,在几个特定子类别中获得了满分。然而,血管疾病(50%)和肺癌(66.7%)的准确率明显较低。总之,虽然ChatGPT 4.0在诊断胸部疾病方面显示出强大的潜力,但所发现的变异性凸显了对其变压器架构进行持续研究和优化的必要性。这将提高其在更广泛的临床和患者护理环境中的可靠性和适用性。