Satagopan Jaya M, Verbel David A, Venkatraman E S, Offit Kenneth E, Begg Colin B
Department of Epidemiology and Biostatistics, Memorial Sloan-Kettering Cancer Center, New York, New York 10021, USA.
Biometrics. 2002 Mar;58(1):163-70. doi: 10.1111/j.0006-341x.2002.00163.x.
The goal of this article is to describe a two-stage design that maximizes the power to detect gene-disease associations when the principal design constraint is the total cost, represented by the total number of gene evaluations rather than the total number of individuals. In the first stage, all genes of interest are evaluated on a subset of individuals. The most promising genes are then evaluated on additional subjects in the second stage. This will eliminate wastage of resources on genes unlikely to be associated with disease based on the results of the first stage. We consider the case where the genes are correlated and the case where the genes are independent. Using simulation results, it is shown that, as a general guideline when the genes are independent or when the correlation is small, utilizing 75% of the resources in stage 1 to screen all the markers and evaluating the most promising 10% of the markers with the remaining resources provides near-optimal power for a broad range of parametric configurations. This translates to screening all the markers on approximately one quarter of the required sample size in stage 1.
本文的目的是描述一种两阶段设计,当主要设计约束是总成本(以基因评估总数而非个体总数表示)时,这种设计能最大限度地提高检测基因与疾病关联的能力。在第一阶段,对一部分个体评估所有感兴趣的基因。然后在第二阶段对另外的受试者评估最有前景的基因。基于第一阶段的结果,这将消除在不太可能与疾病相关的基因上的资源浪费。我们考虑基因相关和基因独立的情况。通过模拟结果表明,作为一般指导原则,当基因独立或相关性较小时,在第一阶段利用75%的资源筛选所有标记,并使用剩余资源评估最有前景的10%的标记,对于广泛的参数配置可提供接近最优的能力。这意味着在第一阶段在大约四分之一的所需样本量上筛选所有标记。