Gaskill Brianna N, Garner Joseph P
J Am Assoc Lab Anim Sci. 2020 Jan 1;59(1):9-16. doi: 10.30802/AALAS-JAALAS-19-000042. Epub 2019 Dec 18.
The practical application of statistical power is becoming an increasingly important part of experimental design, data analysis, and reporting. Power is essential to estimating sample size as part of planning studies and obtaining ethical approval for them. Furthermore, power is essential for publishing and interpreting negative results. In this manuscript, we review what power is, how it can be calculated, and reporting recommendations if a null result is found. Power can be thought of as reflecting the signal to noise ratio of an experiment. The conventional wisdom that statistical power is driven by sample size (which increases the signal in the data), while true, is a misleading oversimplification. Relatively little discussion covers the use of experimental designs which control and reduce noise. Even small improvements in experimental design can achieve high power at much lower sample sizes than (for instance) a simple test. Failure to report experimental design or the proposed statistical test on animal care and use protocols creates a dilemma for IACUCs, because it is unknown whether sample size has been correctly calculated. Traditional power calculations, which are primarily provided for animal number justifications, are only available for simple, yet low powered, experimental designs, such as paired tests. Thus, in most controlled experimental studies, the only analyses for which power can be calculated are those that inheriently have low statistical power; these analyses should not be used because they require more animals than necessary. We provide suggestions for more powerful experimental designs (such as randomized block and factorial designs) that increase power, and we describe methods to easily calculate sample size for these designs that are suitable for IACUC number justifications. Finally we also provide recommendations for reporting negative results, so that readers and reviewers can determine whether an experiment had sufficient power. The use of more sophisticated designs in animal experiments will inevitably improve power, reproducibility, and reduce animal use.
统计效能的实际应用正日益成为实验设计、数据分析和报告的重要组成部分。效能对于估计样本量(作为研究规划的一部分)以及获得相关伦理批准至关重要。此外,效能对于发表和解释阴性结果也必不可少。在本手稿中,我们回顾了什么是效能、如何计算效能,以及如果发现无效结果时的报告建议。效能可以被认为是反映实验的信噪比。传统观念认为统计效能由样本量驱动(样本量增加了数据中的信号),虽然这是正确的,但却是一种误导性的过度简化。相对较少的讨论涉及控制和降低噪声的实验设计的使用。即使实验设计有小的改进,也能在比(例如)简单检验低得多的样本量下实现高效能。未在动物护理和使用方案中报告实验设计或提议的统计检验给机构动物护理和使用委员会(IACUC)带来了困境,因为不清楚样本量是否已正确计算。传统的效能计算主要用于动物数量的论证,仅适用于简单但效能低的实验设计,如配对检验。因此,在大多数对照实验研究中,唯一能计算效能的分析是那些本身统计效能低的分析;这些分析不应被使用,因为它们需要比必要数量更多的动物。我们为能提高效能的更强大的实验设计(如随机区组设计和析因设计)提供建议,并描述适合IACUC数量论证的这些设计的样本量简易计算方法。最后,我们还为报告阴性结果提供建议,以便读者和审稿人能确定一个实验是否有足够的效能。在动物实验中使用更复杂的设计将不可避免地提高效能、可重复性并减少动物使用。