Satagopan Jaya M, Elston Robert C
Department of Epidemiology and Biostatistics, Memorial Sloan-Kettering Cancer Center, New York, New York 10021, USA.
Genet Epidemiol. 2003 Sep;25(2):149-57. doi: 10.1002/gepi.10260.
We propose a cost-effective two-stage approach to investigate gene-disease associations when testing a large number of candidate markers using a case-control design. Under this approach, all the markers are genotyped and tested at stage 1 using a subset of affected cases and unaffected controls, and the most promising markers are genotyped on the remaining individuals and tested using all the individuals at stage 2. The sample size at stage 1 is chosen such that the power to detect the true markers of association is 1-beta(1) at significance level alpha(1). The most promising markers are tested at significance level alpha(2) at stage 2. In contrast, a one-stage approach would evaluate and test all the markers on all the cases and controls to identify the markers significantly associated with the disease. The goal is to determine the two-stage parameters (alpha(1), beta(1), alpha(2)) that minimize the cost of the study such that the desired overall significance is alpha and the desired power is close to 1-beta, the power of the one-stage approach. We provide analytic formulae to estimate the two-stage parameters. The properties of the two-stage approach are evaluated under various parametric configurations and compared with those of the corresponding one-stage approach. The optimal two-stage procedure does not depend on the signal of the markers associated with the study. Further, when there is a large number of markers, the optimal procedure is not substantially influenced by the total number of markers associated with the disease. The results show that, compared to a one-stage approach, a two-stage procedure typically halves the cost of the study.
当使用病例对照设计测试大量候选标记时,我们提出了一种经济高效的两阶段方法来研究基因与疾病的关联。在这种方法下,所有标记在第一阶段使用一部分患病病例和未患病对照进行基因分型和测试,最有希望的标记在第二阶段对其余个体进行基因分型,并使用所有个体进行测试。第一阶段的样本量选择为在显著性水平α(1)下检测真正关联标记的功效为1-β(1)。最有希望的标记在第二阶段以显著性水平α(2)进行测试。相比之下,单阶段方法会对所有病例和对照评估和测试所有标记,以识别与疾病显著相关的标记。目标是确定两阶段参数(α(1)、β(1)、α(2)),使研究成本最小化,同时达到期望的总体显著性水平α,且功效接近单阶段方法的1-β。我们提供了估计两阶段参数的解析公式。在各种参数配置下评估两阶段方法的性质,并与相应的单阶段方法进行比较。最优两阶段程序不依赖于与研究相关标记的信号。此外,当有大量标记时,最优程序基本上不受与疾病相关标记总数的影响。结果表明,与单阶段方法相比,两阶段程序通常可将研究成本减半。