Friedman C, Elstein A, Wolf F, Murphy G, Franz T, Fine P, Heckerling P, Miller T
Center for Biomedical Informatics, University of Pittsburgh, USA.
Stud Health Technol Inform. 1998;52 Pt 2:864-8.
Within medical informatics there is widespread interest in computer-based decision support and the evaluation of its impact. It is widely recognized that the measurement of dependent variables, or outcomes, represents the most challenging aspect of this work. This paper describes and reports the reliability and validity of an outcome metric for studies of diagnostic decision support. The results of this study will guide the analytic methods used in our ongoing multi-site study of the effects of decision support on diagnostic reasoning. Our measurement approach conceptualizes the quality of a diagnostic hypothesis set as having two components summed to generate a composite index: a Plausibility Component derived from ratings of each hypothesis in the set, whether correct or incorrect; and a Location Component derived from the location of the correct diagnosis if it appears in the set. The reliability of this metric is determined by the extent of interrater agreement on the plausibility of diagnostic hypotheses. Validity is determined by the extent to which the index generates scores that make sense on inspection (face validity), as well as the extent to which the component scores are non-redundant and discriminate the performance of novices and experts (construct validity). Using data from the pilot and main phases of our ongoing study (n = 124 subjects working 1116 cases), the reliability of our diagnostic quality metric was found to be 0.85-0.88. The metric was found to generate, on inspection, no clearly counterintuitive scores. Using data from the pilot phase of our study (n = 12 subjects working 108 cases), the component scores were moderately correlated (r = 0.68). The composite index, computed by equally weighting both components, was found to discriminate the hypotheses of medical students and attending physicians by 0.97 standard deviation units. Based on these findings, we have adopted this metric for use in our further research exploring the impact of decision support systems on diagnostic reasoning and will make it available to the informatics research community.
在医学信息学领域,基于计算机的决策支持及其影响评估受到广泛关注。人们普遍认识到,对因变量或结果的测量是这项工作最具挑战性的方面。本文描述并报告了一种用于诊断决策支持研究的结果指标的可靠性和有效性。本研究结果将指导我们正在进行的关于决策支持对诊断推理影响的多中心研究中所使用的分析方法。我们的测量方法将诊断假设集的质量概念化为由两个部分相加生成一个综合指数:一个合理性部分,源自对集合中每个假设(无论正确与否)的评分;以及一个位置部分,源自正确诊断(如果出现在集合中)的位置。该指标的可靠性取决于评估者之间对诊断假设合理性的一致程度。有效性则取决于该指数产生的分数在检查时是否合理(表面效度),以及各部分分数在多大程度上非冗余且能区分新手和专家的表现(结构效度)。利用我们正在进行的研究的试点阶段和主要阶段的数据(n = 124名受试者处理1116个病例),我们发现诊断质量指标的可靠性为0.85 - 0.88。经检查发现该指标不会产生明显违反直觉的分数。利用我们研究试点阶段的数据(n = 12名受试者处理108个病例),各部分分数呈中度相关(r = 0.68)。通过对两个部分进行同等加权计算得出的综合指数,能够以0.97个标准差单位区分医学生和主治医师的假设。基于这些发现,我们已采用该指标用于进一步研究决策支持系统对诊断推理的影响,并将其提供给信息学研究界。