Dai Jianliang, Li Li, Kim Sangkyu, Kimball Beth, Jazwinski S Michal, Arnold Jonathan
Genetics Department, University of Georgia, Athens, Georgia 30602, USA.
Biometrics. 2007 Dec;63(4):1245-52. doi: 10.1111/j.1541-0420.2007.00801.x.
In the Georgia Centenarian Study (Poon et al., Exceptional Longevity, 2006), centenarian cases and young controls are classified according to three categories (age, ethnic origin, and single nucleotide polymorphisms [SNPs] of candidate longevity genes), where each factor has two possible levels. Here we provide methodologies to determine the minimum sample size needed to detect dependence in 2 x 2 x 2 tables based on Fisher's exact test evaluated exactly or by Markov chain Monte Carlo (MCMC), assuming only the case total L and the control total N are known. While our MCMC method uses serial computing, parallel computing techniques are employed to solve the exact sample size problem. These tools will allow researchers to design efficient sampling strategies and to select informative SNPs. We apply our tools to 2 x 2 x 2 tables obtained from a pilot study of the Georgia Centenarians Study, and the sample size results provided important information for the subsequent major study. A comparison between the results of an exact method and those of a MCMC method showed that the MCMC method studied needed much less computation time on average (10.16 times faster on average for situations examined with S.E. = 2.60), but its sample size results were only valid as a rule for larger sample sizes (in the hundreds).
在佐治亚百岁老人研究中(Poon等人,《非凡长寿》,2006年),百岁老人案例和年轻对照组根据三个类别(年龄、种族起源以及候选长寿基因的单核苷酸多态性[SNPs])进行分类,其中每个因素有两个可能的水平。在此,我们提供方法,以确定基于精确评估的Fisher精确检验或马尔可夫链蒙特卡罗(MCMC)方法,在仅已知案例总数L和对照总数N的情况下,检测2×2×2表格中相关性所需的最小样本量。虽然我们的MCMC方法使用串行计算,但采用并行计算技术来解决精确样本量问题。这些工具将使研究人员能够设计有效的抽样策略并选择信息丰富的SNPs。我们将我们的工具应用于从佐治亚百岁老人研究的一项试点研究中获得的2×2×2表格,样本量结果为后续的主要研究提供了重要信息。精确方法与MCMC方法的结果比较表明,所研究的MCMC方法平均所需的计算时间要少得多(对于标准误为2.60的所研究情况,平均快10.16倍),但其样本量结果通常仅对较大样本量(数百个)有效。