Kingman A, Zion G
Epidemiology & Oral Diseases Prevention Program, National Institute of Dental Research, Bethesda, MD 20892.
Stat Med. 1994;13(5-7):769-83. doi: 10.1002/sim.4780130537.
Conventional wisdom suggests that for small data sets having substantial skew, one should attempt to determine the correct distributional form, if possible, and apply statistical methods appropriate for that distribution. Transformations such as the log or square root are often used. If an appropriate distributional form cannot be determined, a distribution-free procedure such as a rank transformation or a randomization test procedure can be used. To better appreciate the effect of such alternatives on both the type I error and power of detecting differences between treatment groups, simulation studies were conducted for responses having specific gamma G(r, theta) and log-normal ln(M, V) distributions. The gamma and log-normal distributions were selected so that they had the same first two moments. A simple two group design was assumed. The reference group always had an average disease level mu = 3.0 (mu = r theta for gamma, mu = M for log-normal), and the treatment group always had means whose reductions ranged from 0 per cent to 50 per cent. The effect of distributional type and the degree of skewness was investigated by varying the population parameter values. Six statistical test procedures were compared for the gamma distributions. All test procedures were robust relative to the type I error. The UMP test based on a ratio of sample means produced the greatest power for all combinations of n, r and RT. The power losses associated with the randomization test, the t-test on original scale, and the t-test on the square root scale were very small, (3 per cent to 6 per cent in absolute value) for n = 10 and 15, and less than 2 per cent for group sizes of 25 or more. The power loss associated with the t-test on the log scale was much larger, ranging from 5 per cent to 10 per cent smaller power than the t-test on original scale. The Wilcoxon rank test produced similar results to that of the LOG t-test for small samples. The power for the shifted LOG (X+c) test increased monotonically to the asymptotic value of the ORIG t-test. The same five test procedures based on differences in sample means were then compared for the corresponding log-normal distributions. The UMP test, that is, LOG(X), produced the highest power. There was very little power lost for the SQRT t-test. The loss in power varied between 2 per cent and 5 per cent for the RANK test.(ABSTRACT TRUNCATED AT 400 WORDS)
传统观点认为,对于具有显著偏态的小数据集,如果可能的话,应该尝试确定正确的分布形式,并应用适合该分布的统计方法。通常会使用对数或平方根等变换。如果无法确定合适的分布形式,可以使用无分布程序,如秩变换或随机化检验程序。为了更好地理解这些替代方法对I型错误以及检测治疗组之间差异的功效的影响,针对具有特定伽马分布G(r, θ)和对数正态分布ln(M, V)的响应进行了模拟研究。选择伽马分布和对数正态分布是为了使它们具有相同的前两个矩。假设采用简单的两组设计。对照组的疾病平均水平始终为μ = 3.0(对于伽马分布,μ = rθ;对于对数正态分布, μ = M),治疗组的均值降低范围从0%到50%。通过改变总体参数值来研究分布类型和偏态程度的影响。针对伽马分布比较了六种统计检验程序。所有检验程序在I型错误方面都具有稳健性。基于样本均值比率的UMP检验在n、r和RT的所有组合中产生了最大的功效。对于n = 10和15,与随机化检验、原始尺度上的t检验以及平方根尺度上的t检验相关的功效损失非常小(绝对值为3%至6%),对于样本量为25或更大的组,功效损失小于2%。对数尺度上的t检验相关的功效损失要大得多,比原始尺度上的t检验功效小5%至10%。对于小样本,威尔科克森秩检验产生的结果与对数t检验类似。移位对数(X + c)检验的功效单调增加至原始t检验的渐近值。然后针对相应的对数正态分布比较了基于样本均值差异的相同五种检验程序。UMP检验,即对数(X),产生了最高的功效。平方根t检验的功效损失非常小。秩检验的功效损失在2%至5%之间变化。(摘要截断于400字)