鉴别诊断生成器：对现有计算机程序的评估。

Differential diagnosis generators: an evaluation of currently available computer programs.

机构信息

Department of Emergency Medicine, Lehigh Valley Health Network, Allentown, PA, USA.

出版信息

J Gen Intern Med. 2012 Feb;27(2):213-9. doi: 10.1007/s11606-011-1804-8.

DOI:10.1007/s11606-011-1804-8

PMID:21789717

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3270234/

Abstract

BACKGROUND

Differential diagnosis (DDX) generators are computer programs that generate a DDX based on various clinical data.

OBJECTIVE

We identified evaluation criteria through consensus, applied these criteria to describe the features of DDX generators, and tested performance using cases from the New England Journal of Medicine (NEJM©) and the Medical Knowledge Self Assessment Program (MKSAP©).

METHODS

We first identified evaluation criteria by consensus. Then we performed Google® and Pubmed searches to identify DDX generators. To be included, DDX generators had to do the following: generate a list of potential diagnoses rather than text or article references; rank or indicate critical diagnoses that need to be considered or eliminated; accept at least two signs, symptoms or disease characteristics; provide the ability to compare the clinical presentations of diagnoses; and provide diagnoses in general medicine. The evaluation criteria were then applied to the included DDX generators. Lastly, the performance of the DDX generators was tested with findings from 20 test cases. Each case performance was scored one through five, with a score of five indicating presence of the exact diagnosis. Mean scores and confidence intervals were calculated.

KEY RESULTS

Twenty three programs were initially identified and four met the inclusion criteria. These four programs were evaluated using the consensus criteria, which included the following: input method; mobile access; filtering and refinement; lab values, medications, and geography as diagnostic factors; evidence based medicine (EBM) content; references; and drug information content source. The mean scores (95% Confidence Interval) from performance testing on a five-point scale were Isabel© 3.45 (2.53, 4.37), DxPlain® 3.45 (2.63-4.27), Diagnosis Pro® 2.65 (1.75-3.55) and PEPID™ 1.70 (0.71-2.69). The number of exact matches paralleled the mean score finding.

CONCLUSIONS

Consensus criteria for DDX generator evaluation were developed. Application of these criteria as well as performance testing supports the use of DxPlain® and Isabel© over the other currently available DDX generators.

摘要

背景

鉴别诊断（DDX）生成器是一种根据各种临床数据生成 DDX 的计算机程序。

目的

我们通过共识确定了评估标准，应用这些标准描述 DDX 生成器的特征，并使用《新英格兰医学杂志》（NEJM©）和《医学知识自我评估计划》（MKSAP©）中的病例进行了性能测试。

方法

我们首先通过共识确定评估标准。然后，我们进行了 Google®和 Pubmed 搜索，以确定 DDX 生成器。要包括在内，DDX 生成器必须满足以下条件：生成潜在诊断列表，而不是文本或文章参考；对需要考虑或排除的关键诊断进行排名或指示；接受至少两个体征、症状或疾病特征；提供比较诊断临床表现的能力；并提供一般医学诊断。然后将评估标准应用于所包括的 DDX 生成器。最后，使用 20 个测试案例的结果测试了 DDX 生成器的性能。每个案例的表现都被评为一到五分，五分表示存在确切的诊断。计算了平均分数和置信区间。

主要结果

最初确定了 23 个程序，其中 4 个符合纳入标准。使用共识标准对这四个程序进行了评估，这些标准包括：输入方法；移动访问；过滤和细化；实验室值、药物和地理位置作为诊断因素；循证医学（EBM）内容；参考文献；以及药物信息内容来源。在五分制上进行性能测试的平均分数（95%置信区间）为 Isabel©3.45（2.53，4.37）、DxPlain®3.45（2.63-4.27）、Diagnosis Pro®2.65（1.75-3.55）和 PEPID™1.70（0.71-2.69）。准确匹配的数量与平均分数相符。

结论

制定了 DDX 生成器评估的共识标准。应用这些标准以及性能测试支持在其他当前可用的 DDX 生成器中使用 DxPlain®和 Isabel©。

相似文献

Differential diagnosis generators: an evaluation of currently available computer programs.鉴别诊断生成器：对现有计算机程序的评估。

J Gen Intern Med. 2012 Feb;27(2):213-9. doi: 10.1007/s11606-011-1804-8.

Evaluation of medical decision support systems (DDX generators) using real medical cases of varying complexity and origin.使用不同复杂程度和来源的真实医疗病例评估医学决策支持系统（鉴别诊断生成器）。

BMC Med Inform Decis Mak. 2022 Sep 24;22(1):254. doi: 10.1186/s12911-022-01988-2.

Effects of Combinational Use of Additional Differential Diagnostic Generators on the Diagnostic Accuracy of the Differential Diagnosis List Developed by an Artificial Intelligence-Driven Automated History-Taking System: Pilot Cross-Sectional Study.额外鉴别诊断生成器联合使用对人工智能驱动的自动病史采集系统所制定的鉴别诊断列表诊断准确性的影响：前瞻性横断面研究

JMIR Form Res. 2023 Aug 2;7:e49034. doi: 10.2196/49034.

The Effectiveness of Electronic Differential Diagnoses (DDX) Generators: A Systematic Review and Meta-Analysis.电子鉴别诊断（DDX）生成器的有效性：系统评价与Meta分析

PLoS One. 2016 Mar 8;11(3):e0148991. doi: 10.1371/journal.pone.0148991. eCollection 2016.

An approach to evaluating the accuracy of DXplain.一种评估DXplain准确性的方法。

Comput Methods Programs Biomed. 1991 Aug;35(4):261-6. doi: 10.1016/0169-2607(91)90004-d.

Is language an issue? Accuracy of the German computerized diagnostic decision support system ISABEL and cross-validation with the English counterpart.语言是否存在问题？德国计算机化诊断决策支持系统 ISABEL 的准确性以及与英语对应系统的交叉验证。

Diagnosis (Berl). 2023 Jul 24;10(4):398-405. doi: 10.1515/dx-2023-0047. eCollection 2023 Nov 1.

Performance of four computer-based diagnostic systems.四种基于计算机的诊断系统的性能。

N Engl J Med. 1994 Jun 23;330(25):1792-6. doi: 10.1056/NEJM199406233302506.

Beyond the black stump: rapid reviews of health research issues affecting regional, rural and remote Australia.超越黑木树：影响澳大利亚地区、农村和偏远地区的健康研究问题的快速综述。

Med J Aust. 2020 Dec;213 Suppl 11:S3-S32.e1. doi: 10.5694/mja2.50881.

Promoting clinical reasoning with meta-memory techniques to teach broad differential diagnosis generation in a pediatric core clerkship.运用元记忆技巧促进临床推理，以在儿科核心实习中教授广泛的鉴别诊断生成。

Diagnosis (Berl). 2023 Jul 13;10(3):242-248. doi: 10.1515/dx-2023-0038. eCollection 2023 Aug 1.

A diagnostic time-out to improve differential diagnosis in pediatric abdominal pain.通过诊断暂停来改善小儿腹痛的鉴别诊断。

Diagnosis (Berl). 2019 Nov 2;8(2):209-217. doi: 10.1515/dx-2019-0054. Print 2021 May 26.

引用本文的文献

"Electronic Pediatrician", a non-machine learning prototype artificial intelligence software for pediatric computer-assisted pathophysiologic diagnosis - general presentation.“电子儿科医生”，一款用于儿科计算机辅助病理生理诊断的非机器学习原型人工智能软件——概述。

World J Methodol. 2025 Sep 20;15(3):100903. doi: 10.5662/wjm.v15.i3.100903.

Large Language Models for Rare Disease Diagnosis at the Undiagnosed Diseases Network.未确诊疾病网络中用于罕见病诊断的大语言模型

JAMA Netw Open. 2025 Aug 1;8(8):e2528538. doi: 10.1001/jamanetworkopen.2025.28538.

A large language model improves clinicians' diagnostic performance in complex critical illness cases.一个大语言模型提高了临床医生在复杂重症病例中的诊断表现。

Crit Care. 2025 Jun 6;29(1):230. doi: 10.1186/s13054-025-05468-7.

Dedicated AI Expert System vs Generative AI With Large Language Model for Clinical Diagnoses.用于临床诊断的专用人工智能专家系统与具有大语言模型的生成式人工智能对比

JAMA Netw Open. 2025 May 1;8(5):e2512994. doi: 10.1001/jamanetworkopen.2025.12994.

Evaluation and significance of a digital assistant for patient history-taking and physical examination in telemedicine.远程医疗中用于患者病史采集和体格检查的数字助手的评估及意义

Oxf Open Digit Health. 2024 Feb 2;2:oqae008. doi: 10.1093/oodh/oqae008. eCollection 2024.

Towards accurate differential diagnosis with large language models.迈向使用大语言模型进行准确的鉴别诊断。

Nature. 2025 Apr 9. doi: 10.1038/s41586-025-08869-4.

Comparison of Frontier Open-Source and Proprietary Large Language Models for Complex Diagnoses.前沿开源和专有大语言模型在复杂诊断方面的比较

JAMA Health Forum. 2025 Mar 7;6(3):e250040. doi: 10.1001/jamahealthforum.2025.0040.

Comparative Study to Evaluate the Accuracy of Differential Diagnosis Lists Generated by Gemini Advanced, Gemini, and Bard for a Case Report Series Analysis: Cross-Sectional Study.评估Gemini Advanced、Gemini和Bard生成的鉴别诊断列表准确性的比较研究：用于病例报告系列分析的横断面研究。

JMIR Med Inform. 2024 Oct 2;12:e63010. doi: 10.2196/63010.

Diagnostic performance of generative artificial intelligences for a series of complex case reports.生成式人工智能对一系列复杂病例报告的诊断性能

Digit Health. 2024 Jul 21;10:20552076241265215. doi: 10.1177/20552076241265215. eCollection 2024 Jan-Dec.

Evaluation of large language models as a diagnostic aid for complex medical cases.评估大型语言模型作为复杂医疗病例诊断辅助工具的作用。

Front Med (Lausanne). 2024 Jun 20;11:1380148. doi: 10.3389/fmed.2024.1380148. eCollection 2024.

本文引用的文献

An epidemiologic study of closed emergency department malpractice claims in a national database of physician malpractice insurers.一项在医师职业责任保险公司全国数据库中进行的封闭急诊室医疗事故索赔的流行病学研究。

Acad Emerg Med. 2010 May;17(5):553-60. doi: 10.1111/j.1553-2712.2010.00729.x.

Can electronic clinical documentation help prevent diagnostic errors?电子临床文档能帮助预防诊断错误吗？

N Engl J Med. 2010 Mar 25;362(12):1066-9. doi: 10.1056/NEJMp0911734.

Resources medical students use to derive a differential diagnosis.医学生用于推导鉴别诊断的资源。

Med Teach. 2009 Jun;31(6):522-7. doi: 10.1080/01421590802167436.

Clinical cognition and diagnostic error: applications of a dual process model of reasoning.临床认知与诊断错误：推理双过程模型的应用

Adv Health Sci Educ Theory Pract. 2009 Sep;14 Suppl 1:27-35. doi: 10.1007/s10459-009-9182-2. Epub 2009 Aug 11.

Using SNAPPS to facilitate the expression of clinical reasoning and uncertainties: a randomized comparison group trial.使用SNAPPS促进临床推理和不确定性的表达：一项随机对照试验。

Acad Med. 2009 Apr;84(4):517-24. doi: 10.1097/ACM.0b013e31819a8cbf.

Performance of a web-based clinical diagnosis support system for internists.面向内科医生的基于网络的临床诊断支持系统的性能

J Gen Intern Med. 2008 Jan;23 Suppl 1(Suppl 1):37-40. doi: 10.1007/s11606-007-0271-8.

Validation of a diagnostic reminder system in emergency medicine: a multi-centre study.急诊医学诊断提醒系统的验证：一项多中心研究。

Emerg Med J. 2007 Sep;24(9):619-24. doi: 10.1136/emj.2006.044107.

Educational strategies to promote clinical diagnostic reasoning.促进临床诊断推理的教育策略。

N Engl J Med. 2006 Nov 23;355(21):2217-25. doi: 10.1056/NEJMra054782.

What learners and teachers value most in ambulatory educational encounters: a prospective, qualitative study.学习者和教师在门诊教育接触中最看重的因素：一项前瞻性定性研究。

Acad Med. 1999 Feb;74(2):186-91. doi: 10.1097/00001888-199902000-00017.

A report card on computer-assisted diagnosis--the grade: C.计算机辅助诊断的成绩单——成绩：C。

N Engl J Med. 1994 Jun 23;330(25):1824-5. doi: 10.1056/NEJM199406233302512.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验