Hao Wei, Cathey Amber L, Aung Max M, Boss Jonathan, Meeker John D, Mukherjee Bhramar
Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, USA.
Department of Environmental Health Sciences, University of Michigan, Ann Arbor, Michigan, USA.
Environ Health Perspect. 2025 Jun;133(6):67019. doi: 10.1289/EHP15305. Epub 2025 Jun 19.
Quantitative characterization of the health impacts associated with exposure to chemical mixtures has received considerable attention in current environmental and epidemiological studies. With many existing statistical methods and emerging approaches, it is important for practitioners to understand which method is best suited for their inferential goals.
The goal of this paper is to provide empirical simulation-based evidence regarding performance of mixture methods to help guide researchers on selecting the best available methods to address three scientific questions in mixtures analysis: identifying important components of a mixture, identifying interactions among mixture components, and creating a summary score for risk stratification and prediction.
We conducted a review and comparison of 11 analytical methods available for use in mixtures research through extensive simulation studies for continuous and binary outcomes. In addition, we carried out an illustrative data analysis using the PROTECT birth cohort from Puerto Rico to examine the associations between exposure to chemical mixtures-metals, polycyclic aromatic hydrocarbons (PAHs), phthalates, and phenols-and birth outcomes.
Our simulation results suggest that the choice of methods depends on the goal of analysis and that there is no clear winner across the board. For selection of important toxicants in the mixtures and for identifying interactions, Elastic net (Enet) by Zou et al., Lasso for Hierarchical Interactions (HierNet) by Bien et al., and selection of nonlinear interactions by a forward stepwise algorithm (SNIF) by Narisetty et al. have the most stable performance across simulation settings. For overall summary or a cumulative measure, we find that using the Super Learner to combine multiple environmental risk scores can lead to improved risk stratification and prediction properties.
We develop an integrated R package "CompMix" that provides a platform for mixtures analysis where the practitioners can implement a pipeline that includes several approaches for mixtures analysis. Our study offers guidelines for selecting appropriate statistical methods for addressing specific scientific questions related to mixtures research. We identify critical gaps where new and better methods are needed. https://doi.org/10.1289/EHP15305.
在当前的环境和流行病学研究中,对接触化学混合物所产生的健康影响进行定量表征受到了广泛关注。有许多现有的统计方法和新兴方法,从业者了解哪种方法最适合其推断目标非常重要。
本文的目标是提供基于实证模拟的关于混合物方法性能的证据,以帮助指导研究人员选择最佳可用方法,以解决混合物分析中的三个科学问题:识别混合物的重要成分、识别混合物成分之间的相互作用以及创建用于风险分层和预测的综合评分。
我们通过对连续和二元结果进行广泛的模拟研究,对可用于混合物研究的11种分析方法进行了综述和比较。此外,我们使用来自波多黎各的PROTECT出生队列进行了一项实例数据分析,以研究接触化学混合物(金属、多环芳烃(PAHs)、邻苯二甲酸盐和酚类)与出生结局之间的关联。
我们的模拟结果表明,方法的选择取决于分析目标,而且没有一种方法在所有方面都明显占优。对于混合物中重要毒物的选择和相互作用的识别,邹等人提出的弹性网(Enet)、比恩等人提出的层次相互作用套索法(HierNet)以及纳里塞蒂等人提出的通过前向逐步算法选择非线性相互作用(SNIF)在各种模拟设置下具有最稳定的性能。对于总体汇总或累积测量,我们发现使用超级学习器来组合多个环境风险评分可以改善风险分层和预测特性。
我们开发了一个集成的R包“CompMix”,它为混合物分析提供了一个平台,从业者可以在该平台上实施一个包括多种混合物分析方法的流程。我们的研究为选择合适的统计方法来解决与混合物研究相关的特定科学问题提供了指导方针。我们确定了需要新的更好方法的关键差距。https://doi.org/10.1289/EHP15305