Ben-Shabat Niv, Sloma Ariel, Weizman Tomer, Kiderman David, Amital Howard
Sackler Faculty of Medicine, Tel-Aviv University, Tel-Aviv, Israel.
Department of Medicine 'B', Sheba Medical Center, Ramat Gan, Israel.
JMIR Med Inform. 2021 Nov 30;9(11):e32507. doi: 10.2196/32507.
Diagnostic decision support systems (DDSS) are computer programs aimed to improve health care by supporting clinicians in the process of diagnostic decision-making. Previous studies on DDSS demonstrated their ability to enhance clinicians' diagnostic skills, prevent diagnostic errors, and reduce hospitalization costs. Despite the potential benefits, their utilization in clinical practice is limited, emphasizing the need for new and improved products.
The aim of this study was to conduct a preliminary analysis of the diagnostic performance of "Kahun," a new artificial intelligence-driven diagnostic tool.
Diagnostic performance was evaluated based on the program's ability to "solve" clinical cases from the United States Medical Licensing Examination Step 2 Clinical Skills board exam simulations that were drawn from the case banks of 3 leading preparation companies. Each case included 3 expected differential diagnoses. The cases were entered into the Kahun platform by 3 blinded junior physicians. For each case, the presence and the rank of the correct diagnoses within the generated differential diagnoses list were recorded. Each diagnostic performance was measured in two ways: first, as diagnostic sensitivity, and second, as case-specific success rates that represent diagnostic comprehensiveness.
The study included 91 clinical cases with 78 different chief complaints and a mean number of 38 (SD 8) findings for each case. The total number of expected diagnoses was 272, of which 174 were different (some appeared more than once). Of the 272 expected diagnoses, 231 (87.5%; 95% CI 76-99) diagnoses were suggested within the top 20 listed diagnoses, 209 (76.8%; 95% CI 66-87) were suggested within the top 10, and 168 (61.8%; 95% CI 52-71) within the top 5. The median rank of correct diagnoses was 3 (IQR 2-6). Of the 91 expected diagnoses, 62 (68%; 95% CI 59-78) of the cases were suggested within the top 20 listed diagnoses, 44 (48%; 95% CI 38-59) within the top 10, and 24 (26%; 95% CI 17-35) within the top 5. Of the 91 expected diagnoses, in 87 (96%; 95% CI 91-100), at least 2 out of 3 of the cases' expected diagnoses were suggested within the top 20 listed diagnoses; 78 (86%; 95% CI 79-93) were suggested within the top 10; and 61 (67%; 95% CI 57-77) within the top 5.
The diagnostic support tool evaluated in this study demonstrated good diagnostic accuracy and comprehensiveness; it also had the ability to manage a wide range of clinical findings.
诊断决策支持系统(DDSS)是旨在通过在诊断决策过程中支持临床医生来改善医疗保健的计算机程序。先前关于DDSS的研究表明它们有能力提高临床医生的诊断技能、预防诊断错误并降低住院成本。尽管有潜在益处,但它们在临床实践中的应用有限,这凸显了对新的和改进产品的需求。
本研究的目的是对一种新的人工智能驱动的诊断工具“卡洪”的诊断性能进行初步分析。
基于该程序“解决”美国医师执照考试第二步临床技能委员会考试模拟中的临床病例的能力来评估诊断性能,这些病例取自3家领先备考公司的病例库。每个病例包括3个预期的鉴别诊断。3名不知情的初级医生将病例输入卡洪平台。对于每个病例,记录正确诊断在生成的鉴别诊断列表中的存在情况和排名。每种诊断性能通过两种方式衡量:第一,作为诊断敏感性;第二,作为代表诊断全面性的病例特异性成功率。
该研究包括91个临床病例,有78种不同的主要症状,每个病例平均有38个(标准差8)检查结果。预期诊断的总数为272个,其中174个不同(有些出现不止一次)。在272个预期诊断中,231个(87.5%;95%置信区间76 - 99)诊断在列出的前20个诊断中被提出,209个(76.8%;95%置信区间66 - 87)在前10个中被提出,168个(61.8%;95%置信区间52 - 71)在前5个中被提出。正确诊断的中位排名为3(四分位距2 - 6)。在91个预期诊断中,62个(68%;95%置信区间59 - 78)病例的诊断在列出的前20个诊断中被提出,44个(48%;95%置信区间38 - 59)在前10个中被提出,24个(26%;95%置信区间17 - 35)在前5个中被提出。在91个预期诊断中,87个(96%;95%置信区间91 - 100)病例中至少3个预期诊断中的2个在列出的前20个诊断中被提出;78个(86%;95%置信区间79 - 93)在前10个中被提出;61个(67%;95%置信区间57 - 77)在前5个中被提出。
本研究中评估的诊断支持工具显示出良好的诊断准确性和全面性;它也有能力处理广泛的临床检查结果。