Department of Psychiatry and Psychotherapy II, Mental Health & Old Age Psychiatry, Ulm University, Ulm, Germany.
BMC Med Res Methodol. 2011 Dec 16;11:169. doi: 10.1186/1471-2288-11-169.
The therapeutic efficacy of an intervention is often assessed in clinical trials by scales measuring multiple diverse activities that are added to produce a cumulative global score. Medical communities and health care systems subsequently use these data to calculate pooled effect sizes to compare treatments. This is done because major doubt has been cast over the clinical relevance of statistically significant findings relying on p values with the potential to report chance findings. Hence in an aim to overcome this pooling the results of clinical studies into a meta-analyses with a statistical calculus has been assumed to be a more definitive way of deciding of efficacy.
We simulate the therapeutic effects as measured with additive scales in patient cohorts with different disease severity and assess the limitations of an effect size calculation of additive scales which are proven mathematically.
We demonstrate that the major problem, which cannot be overcome by current numerical methods, is the complex nature and neurobiological foundation of clinical psychiatric endpoints in particular and additive scales in general. This is particularly relevant for endpoints used in dementia research. 'Cognition' is composed of functions such as memory, attention, orientation and many more. These individual functions decline in varied and non-linear ways. Here we demonstrate that with progressive diseases cumulative values from multidimensional scales are subject to distortion by the limitations of the additive scale. The non-linearity of the decline of function impedes the calculation of effect sizes based on cumulative values from these multidimensional scales.
Statistical analysis needs to be guided by boundaries of the biological condition. Alternatively, we suggest a different approach avoiding the error imposed by over-analysis of cumulative global scores from additive scales.
干预措施的治疗效果通常在临床试验中通过测量多种不同活动的量表来评估,这些活动被加起来产生一个累积的全球评分。医学社区和医疗保健系统随后使用这些数据计算汇总效应大小,以比较治疗方法。这是因为人们对基于 p 值的统计学显著发现的临床相关性产生了重大怀疑,这些 p 值有可能报告偶然发现。因此,为了克服这一问题,人们假设将临床研究的结果汇总到荟萃分析中,并进行统计学计算,是一种更确定地确定疗效的方法。
我们模拟了在不同疾病严重程度的患者队列中用加性量表测量的治疗效果,并评估了加性量表的效应大小计算的局限性,这些局限性已被证明在数学上是合理的。
我们证明了一个主要问题,即当前的数值方法无法克服,这就是临床精神科终点特别是加性量表的复杂性质和神经生物学基础。这对于痴呆症研究中使用的终点尤其相关。“认知”由记忆、注意力、定向等多种功能组成。这些功能以不同的、非线性的方式下降。在这里,我们证明了随着进行性疾病的发展,多维量表的累积值受到加性量表的限制的扭曲。功能下降的非线性阻碍了基于这些多维量表的累积值计算效应大小。
统计分析需要以生物学状况的界限为指导。或者,我们建议采用一种不同的方法,避免因对加性量表的累积全球评分进行过度分析而产生的错误。