Suppr超能文献

在不增加样本量的情况下提高检验效能的七种方法。

Seven ways to increase power without increasing N.

作者信息

Hansen W B, Collins L M

机构信息

Department of Public Health Sciences, Bowman Gray School of Medicine, Winston-Salem, NC 27157-1063, USA.

出版信息

NIDA Res Monogr. 1994;142:184-95.

PMID:9243537
Abstract

Many readers of this monograph may wonder why a chapter on statistical power was included. After all, by now the issue of statistical power is in many respects mundane. Everyone knows that statistical power is a central research consideration, and certainly most National Institute on Drug Abuse grantees or prospective grantees understand the importance of including a power analysis in research proposals. However, there is ample evidence that, in practice, prevention researchers are not paying sufficient attention to statistical power. If they were, the findings observed by Hansen (1992) in a recent review of the prevention literature would not have emerged. Hansen (1992) examined statistical power based on 46 cohorts followed longitudinally, using nonparametric assumptions given the subjects' age at posttest and the numbers of subjects. Results of this analysis indicated that, in order for a study to attain 80-percent power for detecting differences between treatment and control groups, the difference between groups at posttest would need to be at least 8 percent (in the best studies) and as much as 16 percent (in the weakest studies). In order for a study to attain 80-percent power for detecting group differences in pre-post change, 22 of the 46 cohorts would have needed relative pre-post reductions of greater than 100 percent. Thirty-three of the 46 cohorts had less than 50-percent power to detect a 50-percent relative reduction in substance use. These results are consistent with other review findings (e.g., Lipsey 1990) that have shown a similar lack of power in a broad range of research topics. Thus, it seems that, although researchers are aware of the importance of statistical power (particularly of the necessity for calculating it when proposing research), they somehow are failing to end up with adequate power in their completed studies. This chapter argues that the failure of many prevention studies to maintain adequate statistical power is due to an overemphasis on sample size (N) as the only, or even the best, way to increase statistical power. It is easy to see how this overemphasis has come about. Sample size is easy to manipulate, has the advantage of being related to power in a straight-forward way, and usually is under the direct control of the researcher, except for limitations imposed by finances or subject availability. Another option for increasing power is to increase the alpha used for hypothesis-testing but, as very few researchers seriously consider significance levels much larger than the traditional .05, this strategy seldom is used. Of course, sample size is important, and the authors of this chapter are not recommending that researchers cease choosing sample sizes carefully. Rather, they argue that researchers should not confine themselves to increasing N to enhance power. It is important to take additional measures to maintain and improve power over and above making sure the initial sample size is sufficient. The authors recommend two general strategies. One strategy involves attempting to maintain the effective initial sample size so that power is not lost needlessly. The other strategy is to take measures to maximize the third factor that determines statistical power: effect size.

摘要

本专著的许多读者可能会纳闷,为何要纳入一章关于统计功效的内容。毕竟,如今统计功效问题在很多方面都已稀松平常。人人都知道统计功效是研究中的核心考量因素,而且大多数美国国立药物滥用研究所的受资助者或潜在受资助者肯定都明白在研究提案中纳入功效分析的重要性。然而,有充分证据表明,在实际操作中,预防研究人员并未对统计功效给予足够重视。要是他们重视了,汉森(1992年)在近期对预防文献的综述中所观察到的结果就不会出现。汉森(1992年)基于46个纵向追踪的队列研究,根据受试者在后测时的年龄和受试者数量,采用非参数假设来检验统计功效。该分析结果表明,为使一项研究有80%的功效检测治疗组与对照组之间的差异,在后测时两组之间的差异至少需达到8%(在最佳研究中),最高可达16%(在最薄弱的研究中)。为使一项研究有80%的功效检测前后变化中的组间差异,46个队列中的22个队列需要前后相对减少幅度大于100%。46个队列中有33个队列检测物质使用相对减少50%的功效低于50%。这些结果与其他综述结果(如利普西,1990年)一致,这些结果表明在广泛的研究主题中都存在类似的功效不足情况。因此,似乎尽管研究人员意识到统计功效的重要性(尤其是在提出研究时计算它的必要性),但他们在完成的研究中不知为何最终未能获得足够的功效。本章认为,许多预防研究未能保持足够的统计功效是由于过度强调样本量(N)是增加统计功效的唯一甚至最佳方法。很容易看出这种过度强调是如何产生的。样本量易于操控,具有以直接方式与功效相关的优势,并且通常在研究人员的直接控制之下,除非受到资金或受试者可得性的限制。增加功效的另一个选择是增加用于假设检验的α水平,但由于很少有研究人员认真考虑比传统的0.05大得多的显著性水平,所以这种策略很少被使用。当然,样本量很重要,本章作者并非建议研究人员停止仔细选择样本量。相反,他们认为研究人员不应局限于通过增加N来提高功效。除了确保初始样本量足够之外,采取额外措施来维持和提高功效很重要。作者推荐两种总体策略。一种策略是尝试维持有效的初始样本量,以免不必要地损失功效。另一种策略是采取措施最大化决定统计功效的第三个因素:效应大小。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验