Kahn M G, Steib S A, Dunagan W C, Fraser V J
Section of Medical Informatics, Division of General Medical Sciences, Washington University School of Medicine, St. Louis, MO, USA.
J Am Med Inform Assoc. 1996 May-Jun;3(3):216-23. doi: 10.1136/jamia.1996.96310635.
To evaluate the applicability of metrics collected during routine use to monitor the performance of a deployed expert system.
Two extensive formal evaluations of the GermWatcher (Washington University School of Medicine) expert system were performed approximately six months apart. Deficiencies noted during the first evaluation were corrected via a series of interim changes to the expert system rules, even though the expert system was in routine use. As part of their daily work routine, infection control nurses reviewed expert system output and changed the output results with which they disagreed. The rate of nurse disagreement with expert system output was used as an indirect or surrogate metric of expert system performance between formal evaluations. The results of the second evaluation were used to validate the disagreement rate as an indirect performance measure. Based on continued monitoring of user feedback, expert system changes incorporated after the second formal evaluation have resulted in additional improvements in performance.
The rate of nurse disagreement with GermWatcher output decreased consistently after each change to the program. The second formal evaluation confirmed a marked improvement in the program's performance, justifying the use of the nurses' disagreement rate as an indirect performance metric.
Metrics collected during the routine use of the GermWatcher expert system can be used to monitor the performance of the expert system. The impact of improvements to the program can be followed using continuous user feedback without requiring extensive formal evaluations after each modification. When possible, the design of an expert system should incorporate measures of system performance that can be collected and monitored during the routine use of the system.
评估在常规使用过程中收集的指标对于监测已部署专家系统性能的适用性。
对GermWatcher(华盛顿大学医学院)专家系统进行了两次广泛的正式评估,两次评估相隔约六个月。尽管专家系统正在常规使用,但在第一次评估中发现的缺陷通过对专家系统规则的一系列临时更改得到了纠正。作为日常工作的一部分,感染控制护士审查专家系统的输出,并更改他们不同意的输出结果。护士与专家系统输出结果不一致的比率被用作两次正式评估之间专家系统性能的间接或替代指标。第二次评估的结果用于验证不一致率作为间接性能指标的有效性。基于对用户反馈的持续监测,第二次正式评估后纳入的专家系统更改进一步提高了性能。
每次对程序进行更改后,护士与GermWatcher输出结果不一致的比率持续下降。第二次正式评估证实了该程序性能的显著提高,证明将护士的不一致率用作间接性能指标是合理的。
在GermWatcher专家系统常规使用过程中收集的指标可用于监测专家系统的性能。通过持续的用户反馈可以跟踪程序改进的影响,而无需在每次修改后进行广泛的正式评估。在可能的情况下,专家系统的设计应纳入在系统常规使用期间可收集和监测的系统性能指标。