Deyo R A, Diehr P, Patrick D L
Department of Medicine, University of Washington, Seattle.
Control Clin Trials. 1991 Aug;12(4 Suppl):142S-158S. doi: 10.1016/s0197-2456(05)80019-4.
Before being introduced to wide use, health status instruments should be evaluated for reliability and validity. Increasingly, they are also tested for responsiveness to important clinical changes. Although standards exist for assessing these properties, confusion and inconsistency arise because multiple statistics are used for the same property; controversy exists over how to measure responsiveness; many statistics are unavailable on common software programs; strategies for measuring these properties vary; and it is often unclear how to define a clinically important change in patient status. Using data from a clinical trial of therapy for back pain, we demonstrate the calculation of several statistics for measuring reproducibility and responsiveness, and demonstrate relationships among them. Simple computational guides for several statistics are provided. We conclude that reproducibility should generally be quantified with the intraclass correlation coefficient rather than the more common Pearson r. Assessing reproducibility by retest at one-to-two week intervals (rather than a shorter interval) may result in more realistic estimates of the variability to be observed among control subjects in a longitudinal study. Instrument responsiveness should be quantified using indicators of effect size, a modified effect size statistic proposed by Guyatt, or the use of receiver operating characteristic (ROC) curves to describe how well various score changes can distinguish improved from unimproved patients.
在被广泛应用之前,健康状况评估工具应进行可靠性和有效性评估。越来越多的工具还会接受对重要临床变化的反应性测试。尽管存在评估这些特性的标准,但由于对同一特性使用了多种统计方法,导致出现了混淆和不一致的情况;对于如何衡量反应性存在争议;许多统计方法在常见软件程序中无法获取;测量这些特性的策略各不相同;而且通常不清楚如何定义患者状况的临床重要变化。利用一项背痛治疗临床试验的数据,我们展示了几种用于测量可重复性和反应性的统计方法的计算过程,并展示了它们之间的关系。还提供了几种统计方法的简单计算指南。我们得出结论,一般应以组内相关系数而非更常用的皮尔逊r来量化可重复性。在一到两周的间隔时间(而非更短间隔)进行重测来评估可重复性,可能会在纵向研究中对对照组受试者中观察到的变异性得出更现实的估计。工具的反应性应以效应量指标、盖亚特提出的修正效应量统计量或使用受试者工作特征(ROC)曲线来量化,以描述各种分数变化能多好地区分改善和未改善的患者。