Shekelle P G, Kahan J P, Bernstein S J, Leape L L, Kamberg C J, Park R E
West Los Angeles Veterans Affairs Medical Center, CA, USA.
N Engl J Med. 1998 Jun 25;338(26):1888-95. doi: 10.1056/NEJM199806253382607.
To assess the overuse and underuse of medical procedures, various methods have been developed, but their reproducibility has not been evaluated. This study estimates the reproducibility of one commonly used method.
We performed a parallel, three-way replication of the RAND-University of California at Los Angeles appropriateness method as applied to two medical procedures, coronary revascularization and hysterectomy. Three nine-member multidisciplinary panels of experts were composed for each procedure by stratified random sampling from a list of experts nominated by the relevant specialty societies. Each panel independently rated the same set of clinical scenarios in terms of the appropriateness of the relevant procedure on a risk-benefit scale ranging from 1 to 9. Final ratings were used to classify the procedure in each scenario as necessary or not necessary (to evaluate underuse) and inappropriate or not inappropriate (to evaluate overuse). Reproducibility was measured by overall agreement and by the kappa statistic. The criteria for underuse and overuse derived from these ratings were then applied to real populations of patients who had undergone coronary revascularization or hysterectomy.
The rates of agreement among the three coronary-revascularization panels were 95, 94, and 96 percent for inappropriate-use scenarios and 93, 92, and 92 percent for necessary-use scenarios. Agreement among the three hysterectomy panels was 88, 70, and 74 percent for inappropriate-use scenarios. Scenarios involving necessary use of hysterectomy were not assessed. The three-way kappa statistic to detect overuse was 0.52 for coronary revascularization and 0.51 for hysterectomy. The three-way kappa statistic to detect underuse of coronary revascularization was 0.83. Application of individual panels' criteria to real populations of patients resulted in a 100 percent variation in the proportion of cases classified as inappropriate and a 20 percent variation in the proportion of cases classified as necessary.
The appropriateness method is far from perfect. Appropriateness criteria may be useful in comparing levels of appropriate procedures among populations but should not by themselves be used to direct care for individual patients.
为评估医疗程序的过度使用和使用不足情况,已开发出多种方法,但这些方法的可重复性尚未得到评估。本研究估计了一种常用方法的可重复性。
我们对应用于两种医疗程序(冠状动脉血运重建和子宫切除术)的兰德 - 加利福尼亚大学洛杉矶分校适宜性方法进行了平行的三方重复研究。通过从相关专业协会提名的专家名单中进行分层随机抽样,为每个程序组建了三个由九名成员组成的多学科专家小组。每个小组根据相关程序在从1到9的风险效益量表上的适宜性,独立对同一组临床病例进行评分。最终评分用于将每个病例中的程序分类为必要或不必要(以评估使用不足)以及不适当或适当(以评估过度使用)。通过总体一致性和kappa统计量来衡量可重复性。然后将从这些评分得出的使用不足和过度使用标准应用于接受冠状动脉血运重建或子宫切除术的真实患者群体。
在冠状动脉血运重建的三个专家小组中,对于不适当使用情况的一致率分别为95%、94%和96%,对于必要使用情况的一致率分别为93%、92%和92%。在子宫切除术的三个专家小组中,对于不适当使用情况的一致率分别为88%、70%和74%。涉及必要子宫切除术使用的病例未进行评估。检测冠状动脉血运重建过度使用的三方kappa统计量为0.52,子宫切除术为0.51。检测冠状动脉血运重建使用不足的三方kappa统计量为0.83。将各个专家小组的标准应用于真实患者群体,导致分类为不适当的病例比例有100%的差异,分类为必要的病例比例有20%的差异。
适宜性方法远非完美。适宜性标准在比较不同人群中适当程序的水平时可能有用,但不应单独用于指导个体患者的治疗。