Division of Epidemiology, Statistics and Prevention Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, NIH/DHHS, Rockville, MD 20852, USA.
Stat Med. 2010 Feb 28;29(5):597-613. doi: 10.1002/sim.3823.
Evaluating biomarkers in epidemiological studies can be expensive and time consuming. Many investigators use techniques such as random sampling or pooling biospecimens in order to cut costs and save time on experiments. Commonly, analyses based on pooled data are strongly restricted by distributional assumptions that are challenging to validate because of the pooled biospecimens. Random sampling provides data that can be easily analyzed. However, random sampling methods are not optimal cost-efficient designs for estimating means. We propose and examine a cost-efficient hybrid design that involves taking a sample of both pooled and unpooled data in an optimal proportion in order to efficiently estimate the unknown parameters of the biomarker distribution. In addition, we find that this design can be used to estimate and account for different types of measurement and pooling error, without the need to collect validation data or repeated measurements. We show an example where application of the hybrid design leads to minimization of a given loss function based on variances of the estimators of the unknown parameters. Monte Carlo simulation and biomarker data from a study on coronary heart disease are used to demonstrate the proposed methodology.
在流行病学研究中评估生物标志物可能既昂贵又耗时。为了降低成本和节省实验时间,许多研究人员采用随机抽样或混合生物样本的技术。通常,基于混合数据的分析受到分布假设的严格限制,由于混合生物样本,这些假设难以验证。随机抽样提供了易于分析的数据。但是,随机抽样方法并不是估计平均值的最优成本效益设计。我们提出并研究了一种成本效益混合设计,该设计涉及以最佳比例同时采集混合和未混合数据样本,以便有效地估计生物标志物分布的未知参数。此外,我们发现这种设计可用于估计和考虑不同类型的测量和混合误差,而无需收集验证数据或重复测量。我们展示了一个示例,其中混合设计的应用导致基于未知参数估计量的方差的给定损失函数最小化。我们使用来自冠心病研究的蒙特卡罗模拟和生物标志物数据来演示所提出的方法。