Mitchell Emily M, Lyles Robert H, Manatunga Amita K, Danaher Michelle, Perkins Neil J, Schisterman Enrique F
Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, Georgia 30322, U.S.A.
Biometrics. 2014 Mar;70(1):202-11. doi: 10.1111/biom.12134. Epub 2014 Feb 12.
Epidemiological studies involving biomarkers are often hindered by prohibitively expensive laboratory tests. Strategically pooling specimens prior to performing these lab assays has been shown to effectively reduce cost with minimal information loss in a logistic regression setting. When the goal is to perform regression with a continuous biomarker as the outcome, regression analysis of pooled specimens may not be straightforward, particularly if the outcome is right-skewed. In such cases, we demonstrate that a slight modification of a standard multiple linear regression model for poolwise data can provide valid and precise coefficient estimates when pools are formed by combining biospecimens from subjects with identical covariate values. When these x-homogeneous pools cannot be formed, we propose a Monte Carlo expectation maximization (MCEM) algorithm to compute maximum likelihood estimates (MLEs). Simulation studies demonstrate that these analytical methods provide essentially unbiased estimates of coefficient parameters as well as their standard errors when appropriate assumptions are met. Furthermore, we show how one can utilize the fully observed covariate data to inform the pooling strategy, yielding a high level of statistical efficiency at a fraction of the total lab cost.
涉及生物标志物的流行病学研究常常受到实验室检测费用过高的阻碍。在进行这些实验室检测之前,有策略地合并样本已被证明在逻辑回归设置中能有效降低成本,同时信息损失最小。当目标是以连续生物标志物作为结果进行回归时,合并样本的回归分析可能并不简单,尤其是当结果呈右偏态时。在这种情况下,我们证明,当通过组合具有相同协变量值的受试者的生物样本形成样本池时,对样本池数据的标准多元线性回归模型进行轻微修改,可以提供有效且精确的系数估计。当无法形成这些x同质样本池时,我们提出一种蒙特卡罗期望最大化(MCEM)算法来计算最大似然估计(MLE)。模拟研究表明,当满足适当假设时,这些分析方法能提供系数参数及其标准误差的基本无偏估计。此外,我们展示了如何利用完全观测到的协变量数据为合并策略提供信息,从而在仅花费总实验室成本一小部分的情况下实现高水平的统计效率。