Hahn Elizabeth A, Bode Rita K, Du Hongyan, Cella David
Center on Outcomes, Research and Education, Evanston Northwestern Healthcare, Evanston, Illinois, USA.
Clin Trials. 2006;3(3):280-90. doi: 10.1191/1740774506cn148oa.
In order to make meaningful cross-cultural or cross-linguistic comparisons of health-related quality of life (HRQL) or to pool international research data, it is essential to create unbiased measures that can detect clinically important differences. When HRQL scores differ between cultural/linguistic groups, it is important to determine whether this reflects real group differences, or is the result of systematic measurement variability.
To investigate the linguistic measurement equivalence of a cancer-specific HRQL questionnaire, and to conduct a sensitivity analysis of treatment differences in HRQL in a clinical trial.
Patients with newly diagnosed chronic myelogenous leukemia (n = 1049) completed serial HRQL assessments in an international Phase III trial. Two types of differential item functioning (uniform and non-uniform) were evaluated using item response theory and classical test theory approaches. A sensitivity analysis was conducted to compare HRQL between treatment arms using items without evidence of differential functioning.
Among 27 items, nine (33%) did not exhibit any evidence of differential functioning in both linguistic comparisons (English versus French, English versus German). Although 18 items functioned differently, there was no evidence of systematic bias. In a sensitivity analysis, adjustment for differential functioning affected the magnitude, but not the direction or interpretation of clinical trial treatment arm differences.
Sufficient sample sizes were available for only three of the eight language groups. Identification of differential functioning in two-thirds of the items suggests that current psychometric methods may be too sensitive.
Enhanced methodologies are needed to differentiate trivial from substantive differential item functioning. Systematic variability in HRQL across different groups can be evaluated for its effect upon clinical trial results; a practice recommended when data are pooled across cultural or linguistic groups to make conclusions about treatment effects.
为了对健康相关生活质量(HRQL)进行有意义的跨文化或跨语言比较,或者汇总国际研究数据,创建能够检测出临床重要差异的无偏测量方法至关重要。当不同文化/语言群体的HRQL得分存在差异时,确定这是反映了真实的群体差异,还是系统测量变异性的结果非常重要。
研究一份癌症特异性HRQL问卷的语言测量等效性,并在一项临床试验中对HRQL的治疗差异进行敏感性分析。
新诊断的慢性粒细胞白血病患者(n = 1049)在一项国际III期试验中完成了一系列HRQL评估。使用项目反应理论和经典测试理论方法评估了两种类型的项目功能差异(一致性和非一致性)。进行了敏感性分析,以使用无功能差异证据的项目比较各治疗组之间的HRQL。
在27个项目中,有9个(33%)在两种语言比较(英语与法语、英语与德语)中均未表现出任何功能差异的证据。尽管有18个项目的功能不同,但没有系统偏差的证据。在敏感性分析中,对功能差异的调整影响了大小,但不影响临床试验治疗组差异的方向或解释。
八个语言组中只有三个有足够的样本量。三分之二的项目存在功能差异表明当前的心理测量方法可能过于敏感。
需要改进方法来区分琐碎的和实质性的项目功能差异。可以评估不同组之间HRQL的系统变异性对临床试验结果的影响;当汇总跨文化或跨语言群体的数据以得出治疗效果的结论时,推荐这种做法。