Suppr超能文献

考虑使用基于树的机器学习来评估人口统计学和环境风险因素与健康结果之间的因果关系。

Considerations for using tree-based machine learning to assess causation between demographic and environmental risk factors and health outcomes.

机构信息

Department of Chemical Engineering and Applied Chemistry, University of Toronto, Toronto, Canada.

Department of Mechanical and Industrial Engineering, University of Toronto, Toronto, Canada.

出版信息

Environ Sci Pollut Res Int. 2024 Nov;31(51):60927-60935. doi: 10.1007/s11356-024-35304-4. Epub 2024 Oct 12.

Abstract

Evaluation of the heterogeneous treatment effect (HTE) allows for the assessment of the causal effect of a therapy or intervention while considering heterogeneity in individual factors within a population. Machine learning (ML) methods have previously been employed for HTE evaluation, addressing the limitations associated with modelling complex systems. In this work, three tree-based ML algorithms, causal random forest (CRF), causal Bayesian additive regression trees (CBART), and causal rule ensemble (CRE), are used to analyze the potential causation of benzene exposure to cause childhood acute myeloid leukemia (AML). Data for this analysis is generated by drawing samples from a previously developed model that estimates AML probability given as input demographic information and benzene exposure. Comparison is drawn between the three tree-based algorithms in terms of the predicted average treatment effect (ATE), the regression coefficient of determination, and the computational time of each algorithm. Minimal difference is reported between the three tree-based algorithms in terms of the ATE, as well as the regression coefficient of determination. However, CRF outperforms CBART in terms of algorithm computational time. Moreover, CRF allows for both continuous and binary treatment variables, as opposed to CBART and CRE, making it better suited to environmental health studies, where exposure levels of pollutants shall be considered continuous. Following the comparison of all three algorithms, the influence of adding Gaussian noise to the treatment and outcome variables, as well as outliers, is investigated using CRF. A set of considerations is drawn to guide researchers in using these algorithms. These considerations detail the simulation settings, applications, and results interpretation and aim to provide prompt information in decision-making surrounding the establishment of pollutant exposure thresholds in environmental risk assessments.

摘要

评价异质处理效应(HTE)可以在考虑个体因素在人群中的异质性的情况下,评估治疗或干预的因果效应。机器学习(ML)方法以前曾用于 HTE 评估,解决了与建模复杂系统相关的局限性。在这项工作中,使用了三种基于树的 ML 算法,因果随机森林(CRF)、因果贝叶斯加法回归树(CBART)和因果规则集成(CRE),来分析苯暴露导致儿童急性髓系白血病(AML)的潜在因果关系。该分析的数据是通过从先前开发的模型中抽取样本生成的,该模型根据输入的人口统计信息和苯暴露情况来估计 AML 的概率。从预测平均治疗效果(ATE)、回归系数确定度和每个算法的计算时间等方面对三种基于树的算法进行了比较。三种基于树的算法在 ATE 和回归系数确定度方面差异很小。然而,CRF 在算法计算时间方面优于 CBART。此外,CRF 允许处理和结果变量为连续和二进制,而 CBART 和 CRE 则不允许,这使得它更适合于环境健康研究,其中污染物的暴露水平应被视为连续的。在比较了所有三种算法之后,使用 CRF 研究了向处理和结果变量添加高斯噪声以及异常值的影响。得出了一组考虑因素,以指导研究人员使用这些算法。这些考虑因素详细说明了模拟设置、应用以及结果解释,并旨在为环境风险评估中建立污染物暴露阈值的决策提供及时的信息。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验