Suppr超能文献

基于树的机器学习在健康研究中的应用:文献综述和病例系列。

Using Tree-Based Machine Learning for Health Studies: Literature Review and Case Series.

机构信息

Department of Biostatistics and Epidemiology, Rutgers University, Piscataway, NJ 08854, USA.

Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.

出版信息

Int J Environ Res Public Health. 2022 Dec 1;19(23):16080. doi: 10.3390/ijerph192316080.

Abstract

Tree-based machine learning methods have gained traction in the statistical and data science fields. They have been shown to provide better solutions to various research questions than traditional analysis approaches. To encourage the uptake of tree-based methods in health research, we review the methodological fundamentals of three key tree-based machine learning methods: random forests, extreme gradient boosting and Bayesian additive regression trees. We further conduct a series of case studies to illustrate how these methods can be properly used to solve important health research problems in four domains: variable selection, estimation of causal effects, propensity score weighting and missing data. We exposit that the central idea of using ensemble tree methods for these research questions is accurate prediction via flexible modeling. We applied ensemble trees methods to select important predictors for the presence of postoperative respiratory complication among early stage lung cancer patients with resectable tumors. We then demonstrated how to use these methods to estimate the causal effects of popular surgical approaches on postoperative respiratory complications among lung cancer patients. Using the same data, we further implemented the methods to accurately estimate the inverse probability weights for a propensity score analysis of the comparative effectiveness of the surgical approaches. Finally, we demonstrated how random forests can be used to impute missing data using the Study of Women's Health Across the Nation data set. To conclude, the tree-based methods are a flexible tool and should be properly used for health investigations.

摘要

基于树的机器学习方法在统计和数据科学领域得到了广泛关注。它们被证明可以为各种研究问题提供比传统分析方法更好的解决方案。为了鼓励基于树的方法在健康研究中的应用,我们回顾了三种关键的基于树的机器学习方法的方法学基础:随机森林、极端梯度提升和贝叶斯加性回归树。我们进一步进行了一系列案例研究,说明了如何正确使用这些方法来解决四个领域的重要健康研究问题:变量选择、因果效应估计、倾向评分加权和缺失数据。我们阐述了使用集成树方法解决这些研究问题的核心思想是通过灵活建模进行准确预测。我们将集成树方法应用于选择早期可切除肿瘤的肺癌患者术后呼吸并发症存在的重要预测因子。然后,我们展示了如何使用这些方法来估计手术方法对肺癌患者术后呼吸并发症的因果效应。使用相同的数据,我们进一步实施了方法来准确估计用于比较手术方法效果的倾向评分分析的逆概率权重。最后,我们展示了如何使用 Study of Women's Health Across the Nation 数据集使用随机森林来推断缺失数据。总之,基于树的方法是一种灵活的工具,应该在健康研究中正确使用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/758c/9736500/e901cb248c7a/ijerph-19-16080-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验