Division of Biostatistics, University of Minnesota School of Public Health, Minneapolis, MN, USA.
Department of Biostatistics and Epidemiology, Rutgers School of Public Health, Environmental and Occupational Health Sciences Institute (EOHSI), Rutgers University, Piscataway, NJ, USA.
Stat Methods Med Res. 2022 Apr;31(4):579-593. doi: 10.1177/09622802211013578. Epub 2022 Feb 6.
There is a growing demand for methods to determine the effects that chemical mixtures have on human health. One statistical challenge is identifying true "bad actors" from a mixture of highly correlated predictors, a setting in which standard approaches such as linear regression become highly variable. Weighted Quantile Sum regression has been proposed to address this problem, through a two-step process where mixture component weights are estimated using bootstrap aggregation in a training dataset and inference on the overall mixture effect occurs in a held-out test set. Weighted Quantile Sum regression is popular in applied papers, but the reliance on data splitting is suboptimal, and analysts who use the same data for both steps risk inflating the Type I error rate. We therefore propose a modification of Weighted Quantile Sum regression that uses a permutation test for inference, which allows for weight estimation using the entire dataset and preserves Type I error. To minimize computational burden, we propose replacing the bootstrap with L1 or L2 penalization and describe how to choose the appropriate penalty given expert knowledge about a mixture of interest. We apply our method to a national pregnancy cohort study of prenatal phthalate exposure and child health outcomes.
人们越来越需要方法来确定化学混合物对人类健康的影响。其中一个统计学挑战是从高度相关的预测因子混合物中识别真正的“不良因素”,在这种情况下,标准方法(如线性回归)变得高度可变。加权分位数和回归已被提议用于解决这个问题,通过两步过程,在训练数据集中使用引导聚合来估计混合物成分权重,然后在保留的测试集中进行整体混合物效应的推断。加权分位数和回归在应用论文中很流行,但对数据分割的依赖是次优的,并且在两个步骤中使用相同数据的分析师有夸大Ⅰ型错误率的风险。因此,我们提出了一种加权分位数和回归的修改方法,该方法使用置换检验进行推断,这允许使用整个数据集进行权重估计,并保持Ⅰ型错误率。为了最小化计算负担,我们建议用 L1 或 L2 惩罚来代替引导,并且描述了如何根据对感兴趣的混合物的专业知识来选择适当的惩罚。我们将我们的方法应用于一项全国性的妊娠队列研究,该研究调查了产前邻苯二甲酸酯暴露与儿童健康结果之间的关系。