Wang Xiaofei, Zhou Haibo
Department of Biostatistics and Bioinformatics, Duke University Medical Center, DUMC 2721, Durham, North Carolina 27710, USA.
Biometrics. 2010 Jun;66(2):502-11. doi: 10.1111/j.1541-0420.2009.01280.x. Epub 2009 Jun 9.
In cancer research, it is important to evaluate the performance of a biomarker (e.g., molecular, genetic, or imaging) that correlates patients' prognosis or predicts patients' response to treatment in a large prospective study. Due to overall budget constraint and high cost associated with bioassays, investigators often have to select a subset from all registered patients for biomarker assessment. To detect a potentially moderate association between the biomarker and the outcome, investigators need to decide how to select the subset of a fixed size such that the study efficiency can be enhanced. We show that, instead of drawing a simple random sample from the study cohort, greater efficiency can be achieved by allowing the selection probability to depend on the outcome and an auxiliary variable; we refer to such a sampling scheme as outcome and auxiliary-dependent subsampling (OADS). This article is motivated by the need to analyze data from a lung cancer biomarker study that adopts the OADS design to assess epidermal growth factor receptor (EGFR) mutations as a predictive biomarker for whether a subject responds to a greater extent to EGFR inhibitor drugs. We propose an estimated maximum-likelihood method that accommodates the OADS design and utilizes all observed information, especially those contained in the likelihood score of EGFR mutations (an auxiliary variable of EGFR mutations) that is available to all patients. We derive the asymptotic properties of the proposed estimator and evaluate its finite sample properties via simulation. We illustrate the proposed method with a data example.
在癌症研究中,在一项大型前瞻性研究中评估与患者预后相关或预测患者对治疗反应的生物标志物(例如分子、基因或影像学标志物)的性能非常重要。由于总体预算限制以及生物测定相关的高成本,研究人员通常不得不从所有登记患者中选择一个子集进行生物标志物评估。为了检测生物标志物与结局之间潜在的中等关联,研究人员需要决定如何选择固定大小的子集,以便提高研究效率。我们表明,与从研究队列中抽取简单随机样本不同,通过允许选择概率取决于结局和一个辅助变量,可以实现更高的效率;我们将这种抽样方案称为结局和辅助变量依赖子抽样(OADS)。本文的动机源于需要分析一项肺癌生物标志物研究的数据,该研究采用OADS设计来评估表皮生长因子受体(EGFR)突变作为预测受试者对EGFR抑制剂药物反应程度的生物标志物。我们提出了一种估计最大似然方法,该方法适用于OADS设计并利用所有观察到的信息,特别是所有患者都可获得的EGFR突变似然得分(EGFR突变的一个辅助变量)中包含的信息。我们推导了所提出估计量的渐近性质,并通过模拟评估其有限样本性质。我们用一个数据示例说明了所提出的方法。