Adlassnig K P, Scheithauer W
Department of Medical Computer Sciences, University of Vienna, Austria.
Comput Biomed Res. 1989 Aug;22(4):297-313. doi: 10.1016/0010-4809(89)90026-8.
This paper presents a performance evaluation of the diagnostic accuracy of the medical expert system CADIAG-2/PANCREAS. The study included 47 clinical cases from a university hospital with 51 diagnosis of pancreatic diseases (four patients had two pancreatic diseases). As gold standard, the histologically or clinically confirmed diagnoses were assumed. Performance was studied along three lines: (a) each case was evaluated twice, first, by restricting patient data to history, physical examination, and basic laboratory tests and, second, by utilizing the complete set of data including also special laboratory tests. US. X ray, CT-scan, ECG, and biopsy, if available: (b) considering CADIAG-2's hypotheses generation, each evaluation series was also carried out twice, first, by testing whether the gold standard was the first diagnosis in the ranked list of hypothesis and, second, whether the gold standard was among the hypotheses: (c) receiver operating characteristic (ROC) curves were determined by varying an internal threshold which determined the extent of CADIAG-2's diagnostic hypotheses generation. The evaluation showed that CADIAG-2's initial list of diagnostic hypotheses, based on patient history, physical examination, and basic laboratory tests usually has already included the gold standard diagnosis and thus an application of CADIAG-2 at a very early stage of the diagnostic process seems achievable. Moreover, it turned out that given the complete set of patient's medical data the gold standard is usually ranked at the first place in the list of hypotheses. except for patients with chronic diseases where only unspecific findings are available. The last test series showed that ROC curves do not only allow optimal adjustment of the expert system's internal ad hoc decision criteria such as thresholds, weights, and scores but also provide a basis for better comparing the performance of different medical expert systems.
本文介绍了医学专家系统CADIAG - 2/胰腺的诊断准确性的性能评估。该研究纳入了一家大学医院的47例临床病例,其中有51例胰腺疾病诊断(4例患者患有两种胰腺疾病)。假定经组织学或临床确诊的诊断为金标准。从三个方面研究了性能:(a) 每个病例评估两次,首先,将患者数据限制在病史、体格检查和基本实验室检查;其次,利用包括特殊实验室检查、超声、X射线、CT扫描、心电图以及活检(若有)在内的完整数据集;(b) 考虑到CADIAG - 2的假设生成,每个评估系列也进行两次,首先,测试金标准是否在假设排名列表中是第一个诊断,其次,测试金标准是否在假设之中;(c) 通过改变一个内部阈值来确定受试者工作特征(ROC)曲线,该阈值决定了CADIAG - 2诊断假设生成的程度。评估表明,基于患者病史、体格检查和基本实验室检查的CADIAG - 2诊断假设初始列表通常已经包含金标准诊断,因此在诊断过程的非常早期阶段应用CADIAG - 2似乎是可行的。此外,结果表明,在给定患者完整医疗数据的情况下,金标准通常在假设列表中排名第一,但患有仅有无特异性发现的慢性病患者除外。最后一个测试系列表明,ROC曲线不仅允许对专家系统的内部临时决策标准(如阈值、权重和分数)进行优化调整,还为更好地比较不同医学专家系统的性能提供了基础。