Houtvast Dion C J, Betz Milan W, Van Hooren Bas, Vanbelle Sophie, Verdijk Lex B, van Loon Luc J C, Trommelen Jorn
Department of Human Biology, Institute of Nutrition and Translational Research in Metabolism (NUTRIM), Maastricht University, the Netherlands.
Department of Nutrition and Movement Sciences, Institute of Nutrition and Translational Research in Metabolism (NUTRIM), Maastricht University, the Netherlands.
Clin Nutr ESPEN. 2024 Dec;64:334-343. doi: 10.1016/j.clnesp.2024.10.152. Epub 2024 Oct 24.
Biomedical research frequently employs null hypothesis testing to determine whether an observed difference in a sample is likely to exist in the broader population. Null hypothesis testing generally assumes that differences between groups or interventions are non-existent, unless proven otherwise. Because biomedical studies with human subjects are often limited by financial and logistical resources, they tend to have low statistical power, i.e. a low probability of statistically confirming a true difference. As a result, small but potentially clinically important differences may be overseen or ignored simply due to the absence of a statistically significant difference. This absence is often misinterpreted as 'equivalence' of treatments. In this educational paper, we will use practical examples related to the effects of exercise and nutrition on muscle protein metabolism to illustrate the most important determinants of statistical power, as well as their implications for both investigators and readers of scientific articles. Changes in muscle mass occur at a relatively slow rate, making it practically challenging to detect differences between treatment groups in a long-term setting. One way to make it 'easier' to differentiate between groups and hence increase statistical power is to have a sufficiently long study duration to allow treatment effects to become apparent. This is especially relevant when comparing treatments with relatively small expected differences such as the effect of modest changes in daily protein intake. Secondly, one could try to minimize the variance and response heterogeneity within groups, for example by using strict inclusion criteria and standardization protocols (e.g., meal provision), by using cross-over designs, or even within-subject designs where two interventions are compared simultaneously (e.g., studying an exercised limb vs a contralateral control limb) although this might limit the generalizability of the findings (e.g. such single-limb exercise training is not common in practice). In terms of data interpretation, investigators should obviously refrain from drawing strong conclusions from underpowered studies. Yet, such studies still provide valuable data for meta-analyses. Finally, because muscle protein synthesis rates are highly responsive to anabolic stimuli, acute metabolic studies are more sensitive to detect potentially clinically relevant differences in the anabolic response between treatments. Apart from further elaborating on these topics, this educational article encourages readers to more critically question null findings and scientists to more clearly discuss limitations that may have compromised statistical power.
生物医学研究经常采用零假设检验来确定样本中观察到的差异在更广泛的人群中是否可能存在。零假设检验通常假定组间或干预措施之间不存在差异,除非另有证明。由于涉及人类受试者的生物医学研究往往受到资金和后勤资源的限制,它们往往统计效力较低,即从统计学上确认真实差异的概率较低。因此,一些虽小但可能具有临床重要性的差异可能仅仅因为缺乏统计学上的显著差异而被忽视或忽略。这种缺乏往往被误解为治疗的“等效性”。在这篇教育性论文中,我们将使用与运动和营养对肌肉蛋白质代谢的影响相关的实际例子,来说明统计效力的最重要决定因素,以及它们对科学文章的研究者和读者的影响。肌肉质量的变化发生得相对较慢,这使得在长期研究中检测治疗组之间的差异在实际操作中具有挑战性。使区分组间差异变得“更容易”从而提高统计效力的一种方法是有足够长的研究持续时间,以使治疗效果显现出来。在比较预期差异相对较小的治疗方法时,比如每日蛋白质摄入量适度变化的影响时,这一点尤为重要。其次,可以尝试尽量减少组内的方差和反应异质性,例如通过使用严格的纳入标准和标准化方案(如提供膳食),通过使用交叉设计,甚至采用同时比较两种干预措施的受试者内设计(例如,研究运动的肢体与对侧对照肢体),尽管这可能会限制研究结果的普遍性(例如,这种单肢体运动训练在实际中并不常见)。在数据解释方面,研究者显然应避免从不充分有力的研究中得出强有力的结论。然而,这类研究仍然为荟萃分析提供了有价值的数据。最后,由于肌肉蛋白质合成速率对合成代谢刺激高度敏感,急性代谢研究在检测治疗之间合成代谢反应中潜在的临床相关差异方面更敏感。除了进一步阐述这些主题外,这篇教育性文章鼓励读者更批判性地质疑无结果的发现,并鼓励科学家更清楚地讨论可能影响统计效力的局限性。