Tao Ran, Zeng Donglin, Lin Dan-Yu
Department of Biostatistics and Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN 37232.
Department of Biostatistics, University of North Carolina, Chapel Hill, NC 27599.
J Am Stat Assoc. 2020;115(532):1946-1959. doi: 10.1080/01621459.2019.1671200. Epub 2019 Oct 29.
The two-phase design is a cost-effective sampling strategy to evaluate the effects of covariates on an outcome when certain covariates are too expensive to be measured on all study subjects. Under such a design, the outcome and inexpensive covariates are measured on all subjects in the first phase and the first-phase information is used to select subjects for measurements of expensive covariates in the second phase. Previous research on two-phase studies has focused largely on the inference procedures rather than the design aspects. We investigate the design efficiency of the two-phase study, as measured by the semiparametric efficiency bound for estimating the regression coefficients of expensive covariates. We consider general two-phase studies, where the outcome variable can be continuous, discrete, or censored, and the second-phase sampling can depend on the first-phase data in any manner. We develop optimal or approximately optimal two-phase designs, which can be substantially more efficient than the existing designs. We demonstrate the improvements of the new designs over the existing ones through extensive simulation studies and two large medical studies.
两阶段设计是一种具有成本效益的抽样策略,用于在某些协变量对所有研究对象进行测量成本过高时,评估协变量对结果的影响。在这种设计下,在第一阶段对所有对象测量结果和低成本协变量,并利用第一阶段的信息选择对象在第二阶段测量高成本协变量。先前关于两阶段研究的研究主要集中在推断程序而非设计方面。我们研究两阶段研究的设计效率,以估计高成本协变量回归系数的半参数效率界来衡量。我们考虑一般的两阶段研究,其中结果变量可以是连续的、离散的或删失的,并且第二阶段抽样可以以任何方式依赖于第一阶段数据。我们开发了最优或近似最优的两阶段设计,其效率可能比现有设计高得多。我们通过广泛的模拟研究和两项大型医学研究证明了新设计相对于现有设计的改进。