Kahn M G, Bailey T C, Steib S A, Fraser V J, Dunagan W C
Department of Medicine, Washington University School of Medicine, St. Louis, MO, USA.
J Am Med Inform Assoc. 1996 Jul-Aug;3(4):258-69. doi: 10.1136/jamia.1996.96413133.
The literature on the performance evaluation of medical expert system is extensive, yet most of the techniques used in the early stages of system development are inappropriate for deployed expert systems. Because extensive clinical and informatics expertise and resources are required to perform evaluations, efficient yet effective methods of monitoring performance during the long-term maintenance phase of the expert system life cycle must be devised. Statistical process control techniques provide a well-established methodology that can be used to define policies and procedures for continuous, concurrent performance evaluation. Although the field of statistical process control has been developed for monitoring industrial processes, its tools, techniques, and theory are easily transferred to the evaluation of expert systems. Statistical process tools provide convenient visual methods and heuristic guidelines for detecting meaningful changes in expert system performance. The underlying statistical theory provides estimates of the detection capabilities of alternative evaluation strategies. This paper describes a set of statistical process control tools that can be used to monitor the performance of a number of deployed medical expert systems. It describes how p-charts are used in practice to monitor the GermWatcher expert system. The case volume and error rate of GermWatcher are then used to demonstrate how different inspection strategies would perform.
关于医学专家系统性能评估的文献极为丰富,然而在系统开发早期所使用的大多数技术并不适用于已部署的专家系统。由于进行评估需要广泛的临床和信息学专业知识及资源,因此必须设计出在专家系统生命周期的长期维护阶段监测性能的高效且有效的方法。统计过程控制技术提供了一种成熟的方法,可用于定义持续、同步性能评估的政策和程序。尽管统计过程控制领域是为监测工业过程而发展起来的,但其工具、技术和理论很容易应用于专家系统的评估。统计过程工具为检测专家系统性能的有意义变化提供了便捷的可视化方法和启发式指导原则。基础统计理论提供了对替代评估策略检测能力的估计。本文描述了一组可用于监测多个已部署医学专家系统性能的统计过程控制工具。它阐述了在实践中如何使用p图来监测GermWatcher专家系统。然后利用GermWatcher的病例量和错误率来演示不同检查策略的表现。