Müller Matthias J, Szegedi Armin
Department of Psychiatry, University of Mainz, Germany.
J Clin Psychopharmacol. 2002 Jun;22(3):318-25. doi: 10.1097/00004714-200206000-00013.
Although rater training is increasingly used to improve the quality of the investigated outcome parameters, the reliability of assessments is not perfect. Thus, empirical reliability estimates should be used instead of theoretically assumed perfect reliability. Implications of the reliability of psychiatric assessments for sample size and power calculations in clinical trials are presented. The theoretical basis of sample size and power calculations using empirical reliability scores is delineated. Examples from contemporary research on schizophrenia and depression are used to illustrate several implications for study design and interpretation of results. The tremendous impact of the lack of reliability of psychopathologic assessments on sample size, power, and detectable true score differences in clinical trials is shown. The problem of multiple outcome variables with different reliabilities is addressed. Studies lacking power because of unreliable assessments carry the risk of false-negative findings and raise ethical questions. Rater training is strongly recommended to assess and improve interrater reliability whenever necessary and possible before trials are started. Sample size calculations and power analysis should be based on empirical reliability values of outcome parameters as part of quality assurance and cost savings.
尽管评分者培训越来越多地用于提高所调查结果参数的质量,但评估的可靠性并非完美无缺。因此,应使用经验性可靠性估计值,而非理论上假定的完美可靠性。本文介绍了精神科评估的可靠性对临床试验样本量和效能计算的影响。阐述了使用经验性可靠性分数进行样本量和效能计算的理论基础。引用当代关于精神分裂症和抑郁症研究的例子来说明对研究设计和结果解释的若干影响。结果表明,心理病理学评估缺乏可靠性对临床试验的样本量、效能以及可检测到的真实分数差异有着巨大影响。文中还讨论了具有不同可靠性的多个结果变量的问题。因评估不可靠而缺乏效能的研究存在得出假阴性结果的风险,并引发伦理问题。强烈建议在试验开始前,只要有必要且有可能,就进行评分者培训,以评估和提高评分者间的可靠性。作为质量保证和节省成本的一部分,样本量计算和效能分析应基于结果参数的经验性可靠性值。