Suppr超能文献

基于人工智能的聊天机器人在分析骨科病理方面的准确性:一项多观察者实验分析。

Accuracy of Artificial Intelligence Based Chatbots in Analyzing Orthopedic Pathologies: An Experimental Multi-Observer Analysis.

作者信息

Gehlen Tobias, Joost Theresa, Solbrig Philipp, Stahnke Katharina, Zahn Robert, Jahn Markus, Adl Amini Dominik, Back David Alexander

机构信息

Center for Musculoskeletal Surgery, Charité-Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, 13353 Berlin, Germany.

Move Ahead-Foot Ankle and Sportsclinic, 10117 Berlin, Germany.

出版信息

Diagnostics (Basel). 2025 Jan 19;15(2):221. doi: 10.3390/diagnostics15020221.

Abstract

The rapid development of artificial intelligence (AI) is impacting the medical sector by offering new possibilities for faster and more accurate diagnoses. Symptom checker apps show potential for supporting patient decision-making in this regard. Whether the AI-based decision-making of symptom checker apps shows better performance in diagnostic accuracy and urgency assessment compared to physicians remains unclear. Therefore, this study aimed to investigate the performance of existing symptom checker apps in orthopedic and traumatology cases compared to physicians in the field. 30 fictitious case vignettes of common conditions in trauma surgery and orthopedics were retrospectively examined by four orthopedic and traumatology specialists and four different symptom checker apps for diagnostic accuracy and the recommended urgency of measures. Based on the estimation provided by the doctors and the individual symptom checker apps, the percentage of correct diagnoses and appropriate assessments of treatment urgency was calculated in mean and standard deviation [SD] in [%]. Data were analyzed statistically for accuracy and correlation between the apps and physicians using a nonparametric Spearman's correlation test ( < 0.05). The physicians provided the correct diagnosis in 84.4 ± 18.4% of cases (range: 53.3 to 96.7%), and the symptom checker apps in 35.8 ± 1.0% of cases (range: 26.7 to 54.2%). The agreement in the accuracy of the diagnoses varied from low to high (Physicians vs. Physicians: Spearman's ρ: 0.143 to 0.538; Physicians vs. Apps: Spearman's ρ: 0.007 to 0.358) depending on the different physicians and apps. In relation to the whole population, the physicians correctly assessed the urgency level in 70.0 ± 4.7% (range: 66.7 to 73.3%) and the apps in 20.6 ± 5.6% (range: 10.8 to 37.5%) of cases. The agreement on the accuracy of estimating urgency levels was moderate to high between and within physicians and individual apps. AI-based symptom checker apps for diagnosis in orthopedics and traumatology do not yet provide a more accurate analysis regarding diagnosis and urgency evaluation than physicians. However, there is a broad variation in the accuracy between different digital tools. Altogether, this field of AI application shows excellent potential and should be further examined in future studies.

摘要

人工智能(AI)的快速发展正在为医学领域带来影响,为更快、更准确的诊断提供了新的可能性。症状检查应用程序在这方面显示出支持患者决策的潜力。基于AI的症状检查应用程序在诊断准确性和紧急程度评估方面是否比医生表现更好仍不明确。因此,本研究旨在调查与该领域的医生相比,现有症状检查应用程序在骨科和创伤学病例中的表现。四位骨科和创伤学专家以及四款不同的症状检查应用程序对30个创伤外科和骨科常见病症的虚拟病例 vignettes进行了回顾性检查,以评估诊断准确性和建议措施的紧急程度。根据医生和各个症状检查应用程序提供的估计,计算出正确诊断的百分比以及治疗紧急程度的适当评估的平均值和标准差[SD](以[%]表示)。使用非参数Spearman相关性检验(<0.05)对应用程序与医生之间的准确性和相关性进行统计分析。医生在84.4±18.4%的病例中提供了正确诊断(范围:53.3%至96.7%),症状检查应用程序在35.8±1.0%的病例中提供了正确诊断(范围:26.7%至54.2%)。诊断准确性的一致性因不同的医生和应用程序而异,从低到高(医生与医生之间:Spearman's ρ:0.143至0.538;医生与应用程序之间:Spearman's ρ:0.007至0.358)。就总体人群而言,医生在70.0±4.7%(范围:66.7%至73.3%)的病例中正确评估了紧急程度,应用程序在20.6±5.6%(范围:10.8%至37.5%)的病例中正确评估了紧急程度。医生之间以及各个应用程序内部在估计紧急程度准确性方面的一致性从中等到高。用于骨科和创伤学诊断的基于AI的症状检查应用程序在诊断和紧急程度评估方面尚未提供比医生更准确的分析。然而,不同数字工具之间的准确性存在很大差异。总体而言,这个AI应用领域显示出巨大潜力,应在未来研究中进一步研究。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a2aa/11764310/c127a5c37bd0/diagnostics-15-00221-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验