Brundage M D, Pater J L, Zee B
Department of Community Health and Epidemiology, Queen's University, Kingston, Ontario, Canada.
J Natl Cancer Inst. 1993 Jul 21;85(14):1138-48. doi: 10.1093/jnci/85.14.1138.
The toxicity of a given cancer therapy is an important end point in clinical trials examining the potential costs and benefits of that therapy. Treatment-related toxicity is conventionally measured with one of several toxicity criteria grading scales, even though the reliability and validity of these scales have not been established.
We determined the reliability of the National Cancer Institute of Canada Clinical Trials Group (NCIC-CTG) expanded toxicity scale and the World Health Organization (WHO) standard toxicity scale by use of a clinical simulation of actual patients.
Seven experienced data managers each interviewed 12 simulated patients and scored their respective acute toxic effects. Inter-rater agreement (agreement between multiple raters of the same case) was calculated using the kappa (kappa) statistic across all seven randomly assigned raters for each of 18 toxicity categories (13 NCIC-CTG and five WHO categories). Intra-rater agreement (agreement within the same rater on one case rated on separate occasions) was calculated using kappa over repeated cases (where raters were blinded to the repeated nature of the subjects). Proportions of agreement (estimate of the probability of two randomly selected raters assigning the same toxicity grade to a given case) were also calculated for inter-rater agreement. Since minor lack of agreement might have adversely affected these statistics of agreement, both kappa and proportion of agreement analyses were repeated for the following condensed grading categories: none (0) versus low-grade (1 or 2) versus high-grade (3 or 4) toxicity present.
Modest levels of inter-rater reliability were demonstrated in this study with kappa values that ranged from 0.50 to 1.00 in laboratory-based categories and from -0.04 to 0.82 for clinically based categories. Proportions of agreement for clinical categories ranged from 0.52 to 0.98. Condensing the toxicity grades improved statistics of agreement, but substantial lack of agreement remained (kappa range, -0.04-0.82; proportions of agreement range, 0.67-0.98).
Experienced data managers, when interviewing patients, draw varying conclusions regarding toxic effects experienced by such patients. Neither the NCIC-CTG expanded toxicity scale nor the WHO standard toxicity scale demonstrated a clear superiority in reliability, although the breadth of toxic effects recorded differed.
在评估某种癌症治疗方法潜在成本和效益的临床试验中,该治疗方法的毒性是一个重要的终点指标。尽管这些毒性标准分级量表的可靠性和有效性尚未确定,但治疗相关毒性通常是用几种毒性标准分级量表之一来衡量的。
我们通过对实际患者进行临床模拟,确定了加拿大国家癌症研究所临床试验组(NCIC-CTG)扩展毒性量表和世界卫生组织(WHO)标准毒性量表的可靠性。
7名经验丰富的数据管理人员分别对12名模拟患者进行访谈,并对他们各自的急性毒性效应进行评分。对于18种毒性类别(13种NCIC-CTG类别和5种WHO类别)中的每一种,使用kappa统计量计算所有7名随机分配的评估者之间的评分者间一致性(同一病例的多个评估者之间的一致性)。使用kappa对重复病例(评估者对受试者的重复性质不知情)计算同一评估者在不同时间对同一病例的评分者内一致性。还计算了评分者间一致性的一致性比例(估计两个随机选择的评估者对给定病例分配相同毒性等级的概率)。由于轻微的不一致可能会对这些一致性统计产生不利影响,因此对以下简化分级类别重复进行kappa和一致性比例分析:无(0)与低级别(1或2)与高级别(3或4)毒性。
本研究显示评分者间可靠性处于中等水平,基于实验室的类别中kappa值范围为0.50至1.00,基于临床的类别中kappa值范围为-0.04至0.82。临床类别的一致性比例范围为0.52至0.98。简化毒性等级改善了一致性统计,但仍存在大量不一致(kappa范围为-0.04至0.82;一致性比例范围为0.67至0.98)。
经验丰富的数据管理人员在访谈患者时,对患者经历的毒性效应得出了不同的结论。尽管记录的毒性效应范围不同,但NCIC-CTG扩展毒性量表和WHO标准毒性量表在可靠性方面均未显示出明显优势。