Geneş Muhammet, Deveci Bülent
Cardiology Residency, Department of Cardiology, Sincan Training and Research Hospital, Ankara 06930, Turkey.
Diagnostics (Basel). 2024 Dec 4;14(23):2731. doi: 10.3390/diagnostics14232731.
Artificial intelligence (AI) tools, like ChatGPT, are gaining attention for their potential in supporting clinical decisions. This study evaluates the performance of ChatGPT-4o in acute cardiological cases compared to cardiologists and emergency physicians. Twenty acute cardiological scenarios were used to compare the responses of ChatGPT-4o, cardiologists, and emergency physicians in terms of accuracy, completeness, and response time. Statistical analyses included the Kruskal-Wallis H test and post hoc comparisons using the Mann-Whitney U test with Bonferroni correction. ChatGPT-4o and cardiologists both achieved 100% correct response rates, while emergency physicians showed lower accuracy. ChatGPT-4o provided the fastest responses and obtained the highest accuracy and completeness scores. Statistically significant differences were found between ChatGPT-4o and emergency physicians ( < 0.001), and between cardiologists and emergency physicians ( < 0.001). A Cohen's kappa value of 0.92 indicated a high level of inter-rater agreement. ChatGPT-4o outperformed human clinicians in accuracy, completeness, and response time, highlighting its potential as a clinical decision support tool. However, human oversight remains essential to ensure safe AI integration in healthcare settings.
像ChatGPT这样的人工智能(AI)工具因其在支持临床决策方面的潜力而受到关注。本研究将ChatGPT-4o在急性心脏病病例中的表现与心脏病专家和急诊科医生进行了比较。使用了20个急性心脏病场景,从准确性、完整性和响应时间方面比较ChatGPT-4o、心脏病专家和急诊科医生的回答。统计分析包括Kruskal-Wallis H检验以及使用带有Bonferroni校正的Mann-Whitney U检验进行的事后比较。ChatGPT-4o和心脏病专家的回答正确率均达到100%,而急诊科医生的准确性较低。ChatGPT-4o的回答速度最快,准确性和完整性得分最高。在ChatGPT-4o与急诊科医生之间(<0.001)以及心脏病专家与急诊科医生之间(<0.001)发现了具有统计学意义的差异。Cohen's kappa值为0.92表明评分者间一致性程度较高。ChatGPT-4o在准确性、完整性和响应时间方面优于人类临床医生,凸显了其作为临床决策支持工具的潜力。然而,人为监督对于确保在医疗环境中安全整合人工智能仍然至关重要。