Danaher Michelle R, Albert Paul S, Roy Aninyda, Schisterman Enrique F
Eunice Kennedy Shriver National Institute of Child Health and Human Development, Bethesda, Maryland, U.S.A.
Department of Mathematics and Statistics, University of Maryland, Baltimore County, Baltimore, Maryland, U.S.A.
Stat Med. 2016 Apr 30;35(9):1502-13. doi: 10.1002/sim.6798. Epub 2015 Nov 9.
Pooling, or physically mixing biospecimens, prior to evaluating biomarkers dramatically reduces biomarker evaluation cost, reduces the quantity of biospecimens required of each individual, and may reduce the percentage of laboratory measurements below the lower limit of detection. Motivated by a case-control study on miscarriage (binary outcome) and cytokines (continuous exposures), we are interested in estimating parameters in a logistic regression, where individuals with the same disease status (with or without a miscarriage) are paired and their pooled cytokine concentrations are assessed. Previous research has proposed a set-based logistic model to evaluate the relationship between a disease and pooled exposures. While the set-based logistic model is very useful for estimating main effects, it cannot estimate interactions of continuous exposures when both are measured in pools. Therefore, we propose using the expectation maximization (EM) algorithm to obtain estimators of all parameters in logistic regression model, including interactions effects. Using a simulation study, we present comparisons of efficiency under different scenarios where exposures have been measured in pools and individually. Our simulations show that randomly sampling half of the available biospecimens has less efficiency than pooling pairs of biospecimens stratified by disease status. The EM algorithm provides a method for estimating interaction effects when biospecimens have already been pooled for other reasons such as the gain in efficiency for estimating main effects demonstrated by previous research. This manuscript demonstrates that the EM algorithm offers a promising approach to estimate interaction effects of pooled biospecimens.
在评估生物标志物之前进行合并,即实际混合生物样本,可显著降低生物标志物评估成本,减少每个个体所需的生物样本数量,并可能降低低于检测下限的实验室测量百分比。受一项关于流产(二元结局)和细胞因子(连续暴露)的病例对照研究的启发,我们感兴趣的是估计逻辑回归中的参数,其中具有相同疾病状态(有或没有流产)的个体进行配对,并评估其合并的细胞因子浓度。先前的研究提出了一种基于集合的逻辑模型来评估疾病与合并暴露之间的关系。虽然基于集合的逻辑模型对于估计主效应非常有用,但当两个连续暴露都在合并样本中测量时,它无法估计它们的相互作用。因此,我们建议使用期望最大化(EM)算法来获得逻辑回归模型中所有参数的估计值,包括交互效应。通过模拟研究,我们比较了在不同场景下暴露在合并样本和单独样本中测量时的效率。我们的模拟表明,随机抽取一半可用生物样本的效率低于按疾病状态对生物样本对进行合并的效率。当生物样本由于其他原因(如先前研究所证明的估计主效应时效率的提高)已经合并时,EM算法提供了一种估计交互效应的方法。本文证明了EM算法为估计合并生物样本的交互效应提供了一种有前景的方法。