用于样本量重新估计的古尔德-施氏程序评估。

Department of Biostatistics and Data Management, Wyeth Consumer Healthcare, Madison, NJ 07940, USA.

Pharm Stat. 2007 Jan-Mar;6(1):53-65. doi: 10.1002/pst.244.

The power of a clinical trial is partly dependent upon its sample size. With continuous data, the sample size needed to attain a desired power is a function of the within-group standard deviation. An estimate of this standard deviation can be obtained during the trial itself based upon interim data; the estimate is then used to re-estimate the sample size. Gould and Shih proposed a method, based on the EM algorithm, which they claim produces a maximum likelihood estimate of the within-group standard deviation while preserving the blind, and that the estimate is quite satisfactory. However, others have claimed that the method can produce non-unique and/or severe underestimates of the true within-group standard deviation. Here the method is thoroughly examined to resolve the conflicting claims and, via simulation, to assess its validity and the properties of its estimates. The results show that the apparent non-uniqueness of the method's estimate is due to an apparently innocuous alteration that Gould and Shih made to the EM algorithm. When this alteration is removed, the method is valid in that it produces the maximum likelihood estimate of the within-group standard deviation (and also of the within-group means). However, the estimate is negatively biased and has a large standard deviation. The simulations show that with a standardized difference of 1 or less, which is typical in most clinical trials, the standard deviation from the combined samples ignoring the groups is a better estimator, despite its obvious positive bias.

临床试验的效能部分取决于其样本量。对于连续性数据，达到所需效能所需的样本量是组内标准差的函数。可以在试验过程中根据中期数据获得该标准差的估计值；然后使用该估计值重新估计样本量。古尔德和施提出了一种基于期望最大化（EM）算法的方法，他们声称该方法在保持盲态的同时能产生组内标准差的最大似然估计，并且该估计相当令人满意。然而，其他人声称该方法可能会产生非唯一的和/或对真实组内标准差的严重低估。在此对该方法进行全面研究，以解决相互矛盾的说法，并通过模拟评估其有效性及其估计值的特性。结果表明，该方法估计值的明显非唯一性是由于古尔德和施对EM算法进行的一个看似无害的改动。去除这个改动后，该方法是有效的，因为它能产生组内标准差（以及组内均值）的最大似然估计。然而，该估计值存在负偏差且标准差较大。模拟结果表明，对于标准化差异为1或更小的情况（这在大多数临床试验中很典型），忽略分组的合并样本的标准差是一个更好的估计量，尽管它有明显的正偏差。