Environmental Medicine and Public Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Department of Epidemiology, Columbia University Mailman School of Public Health, New York, NY, USA.
Environmental Medicine and Public Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
Environ Res. 2021 Apr;195:110524. doi: 10.1016/j.envres.2020.110524. Epub 2020 Nov 26.
Variation in the timing of menarche has been linked with adverse health outcomes in later life. There is evidence that exposure to hormonally active agents (or endocrine disrupting chemicals; EDCs) during childhood may play a role in accelerating or delaying menarche. The goal of this study was to generate hypotheses on the relationship between exposure to multiple EDCs and timing of menarche by applying a two-stage machine learning approach.
We used data from the National Health and Nutrition Examination Survey (NHANES) for years 2005-2008. Data were analyzed for 229 female participants 12-16 years of age who had blood and urine biomarker measures of 41 environmental exposures, all with >70% above limit of detection, in seven classes of chemicals. We modeled risk for earlier menarche (<12 years of age vs older) with exposure biomarkers. We applied a two-stage approach consisting of a random forest (RF) to identify important exposure combinations associated with timing of menarche followed by multivariable modified Poisson regression to quantify associations between exposure profiles ("combinations") and timing of menarche.
RF identified urinary concentrations of monoethylhexyl phthalate (MEHP) as the most important feature in partitioning girls into homogenous subgroups followed by bisphenol A (BPA) and 2,4-dichlorophenol (2,4-DCP). In this first stage, we identified 11 distinct exposure biomarker profiles, containing five different classes of EDCs associated with earlier menarche. MEHP appeared in all 11 exposure biomarker profiles and phenols appeared in five. Using these profiles in the second-stage of analysis, we found a relationship between lower MEHP and earlier menarche (MEHP ≤ 2.36 ng/mL vs >2.36 ng/mL: adjusted PR = 1.36, 95% CI: 1.02, 1.80). Combinations of lower MEHP with benzophenone-3, 2,4-DCP, and BPA had similar associations with earlier menarche, though slightly weaker in those smaller subgroups. For girls not having lower MEHP, exposure profiles included other biomarkers (BPA, enterodiol, monobenzyl phthalate, triclosan, and 1-hydroxypyrene); these showed largely null associations in the second-stage analysis. Adjustment for covariates did not materially change the estimates or CIs of these models. We observed weak or null effect estimates for some exposure biomarker profiles and relevant profiles consisted of no more than two EDCs, possibly due to small sample sizes in subgroups.
A two-stage approach incorporating machine learning was able to identify interpretable combinations of biomarkers in relation to timing of menarche; these should be further explored in prospective studies. Machine learning methods can serve as a valuable tool to identify patterns within data and generate hypotheses that can be investigated within future, targeted analyses.
初潮时间的变化与晚年的健康结果不良有关。有证据表明,儿童时期接触具有激素活性的物质(或内分泌干扰化学物质;EDCs)可能在加速或延迟初潮方面发挥作用。本研究的目的是通过应用两阶段机器学习方法,生成与多种 EDC 暴露与初潮时间之间关系的假设。
我们使用了 2005-2008 年国家健康和营养检查调查(NHANES)的数据。对 229 名年龄在 12-16 岁之间的女性参与者进行了数据分析,这些参与者的血液和尿液生物标志物中有 41 种环境暴露,所有标志物的检测限以上的都超过 70%,涉及 7 类化学物质。我们用暴露生物标志物对初潮(<12 岁与>12 岁)的风险进行建模。我们应用了两阶段方法,包括随机森林(RF)来识别与初潮时间相关的重要暴露组合,然后使用多变量修正泊松回归来量化暴露谱(“组合”)与初潮时间之间的关系。
RF 确定了单乙基己基邻苯二甲酸酯(MEHP)在将女孩分成同质亚组中的尿液浓度是最重要的特征,其次是双酚 A(BPA)和 2,4-二氯苯酚(2,4-DCP)。在第一阶段,我们确定了 11 个不同的暴露生物标志物谱,其中包含与初潮提前相关的五个不同类别的 EDCs。MEHP 出现在所有 11 个暴露生物标志物谱中,酚类物质出现在 5 个中。在第二阶段的分析中使用这些谱,我们发现 MEHP 水平较低与初潮提前之间存在关系(MEHP≤2.36ng/mL 与>2.36ng/mL:调整后的 PR=1.36,95%CI:1.02,1.80)。与 BPA 结合的低 MEHP 与苯甲酮-3、2,4-DCP 的组合与初潮提前有类似的关联,尽管在较小的亚组中关联较弱。对于没有较低 MEHP 的女孩,暴露谱包括其他生物标志物(BPA、肠二醇、单苄基邻苯二甲酸酯、三氯生和 1-羟基芘);这些在第二阶段分析中几乎没有关联。调整协变量并没有实质性地改变这些模型的估计值或置信区间。我们观察到一些暴露生物标志物谱的估计值较弱或为零,相关谱中不超过两种 EDC,这可能是由于亚组中的样本量较小。
纳入机器学习的两阶段方法能够识别与初潮时间相关的可解释的生物标志物组合;这些组合应在未来的前瞻性研究中进一步探索。机器学习方法可以作为一种有价值的工具,用于识别数据中的模式并生成可在未来的针对性分析中进行调查的假设。