Univ Rennes, Inserm, EHESP, Irset (Institut de recherche en santé, environnement et travail)─UMR_S 1085, F-35000 Rennes, France.
ISGlobal, 08003 Barcelona, Spain.
Environ Sci Technol. 2023 Oct 31;57(43):16232-16243. doi: 10.1021/acs.est.3c04805. Epub 2023 Oct 16.
The exposome concept aims to consider all environmental stressors simultaneously. The dimension of the data and the correlation that may exist between exposures lead to various statistical challenges. Some methodological studies have provided insight regarding the efficiency of specific modeling approaches in the context of exposome data assessed once for each subject. However, few studies have considered the situation in which environmental exposures are assessed repeatedly. Here, we conduct a simulation study to compare the performance of statistical approaches to assess exposome-health associations in the context of multiple exposure variables. Different scenarios were tested, assuming different types and numbers of exposure-outcome causal relationships. An application study using real data collected within the INMA mother-child cohort (Spain) is also presented. In the simulation experiment, assessed methods showed varying performance across scenarios, making it challenging to recommend a one-size-fits-all strategy. Generally, methods such as sparse partial least-squares and the deletion-substitution-addition algorithm tended to outperform the other tested methods (ExWAS, Elastic-Net, DLNM, or sNPLS). Notably, as the number of true predictors increased, the performance of all methods declined. The absence of a clearly superior approach underscores the additional challenges posed by repeated exposome data, such as the presence of more complex correlation structures and interdependencies between variables, and highlights that careful consideration is essential when selecting the appropriate statistical method. In this regard, we provide recommendations based on the expected scenario. Given the heightened risk of reporting false positive or negative associations when applying these techniques to repeated exposome data, we advise interpreting the results with caution, particularly in compromised contexts such as those with a limited sample size.
外核组学的概念旨在同时考虑所有环境应激因素。数据的维度以及暴露因素之间可能存在的相关性给各种统计方法带来了挑战。一些方法学研究提供了一些关于在每个研究对象的外核组学数据中评估一次的情况下,特定建模方法的效率的见解。然而,很少有研究考虑到环境暴露因素被重复评估的情况。在这里,我们进行了一项模拟研究,以比较评估外核组学与健康关联的统计方法在多个暴露变量的情况下的性能。不同的情况进行了测试,假设了不同类型和数量的暴露-结果因果关系。还提出了一项使用 INMA 母婴队列(西班牙)收集的真实数据进行的应用研究。在模拟实验中,评估方法在不同情况下的表现不同,因此很难推荐一种通用的策略。一般来说,稀疏偏最小二乘法和删除-替换-添加算法等方法往往优于其他测试方法(ExWAS、Elastic-Net、DLNM 或 sNPLS)。值得注意的是,随着真实预测因子数量的增加,所有方法的性能都有所下降。没有一种明显优越的方法突显了重复外核组学数据带来的额外挑战,例如更复杂的相关结构和变量之间的相互依存关系,并强调在选择适当的统计方法时需要谨慎考虑。在这方面,我们根据预期的情况提供了建议。鉴于在应用这些技术于重复的外核组学数据时报告假阳性或假阴性关联的风险增加,我们建议谨慎解释结果,特别是在样本量有限等情况下。