流行病学研究中的决策树

Decision trees in epidemiological research.

作者信息

Venkatasubramaniam Ashwini, Wolfson Julian, Mitchell Nathan, Barnes Timothy, JaKa Meghan, French Simone

机构信息

Urban Big Data Centre, University of Glasgow, 7 Lilybank Gardens, Glasgow, G12 8RZ UK.

Division of Biostatistics, University of Minnesota, Twin Cities, A453 Mayo Building, MMC 303, 420 Delaware St SE, Minneapolis, MN 55455 USA.

出版信息

Emerg Themes Epidemiol. 2017 Sep 20;14:11. doi: 10.1186/s12982-017-0064-4. eCollection 2017.

DOI:10.1186/s12982-017-0064-4

PMID:28943885

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5607590/

Abstract

BACKGROUND

In many studies, it is of interest to identify population subgroups that are relatively homogeneous with respect to an outcome. The nature of these subgroups can provide insight into effect mechanisms and suggest targets for tailored interventions. However, identifying relevant subgroups can be challenging with standard statistical methods.

MAIN TEXT

We review the literature on decision trees, a family of techniques for partitioning the population, on the basis of covariates, into distinct subgroups who share similar values of an outcome variable. We compare two decision tree methods, the popular Classification and Regression tree (CART) technique and the newer Conditional Inference tree (CTree) technique, assessing their performance in a simulation study and using data from the Box Lunch Study, a randomized controlled trial of a portion size intervention. Both CART and CTree identify homogeneous population subgroups and offer improved prediction accuracy relative to regression-based approaches when subgroups are truly present in the data. An important distinction between CART and CTree is that the latter uses a formal statistical hypothesis testing framework in building decision trees, which simplifies the process of identifying and interpreting the final tree model. We also introduce a novel way to visualize the subgroups defined by decision trees. Our novel graphical visualization provides a more scientifically meaningful characterization of the subgroups identified by decision trees.

CONCLUSIONS

Decision trees are a useful tool for identifying homogeneous subgroups defined by combinations of individual characteristics. While all decision tree techniques generate subgroups, we advocate the use of the newer CTree technique due to its simplicity and ease of interpretation.

摘要

背景

在许多研究中，识别在某一结果方面相对同质的人群亚组很有意义。这些亚组的性质可以为效应机制提供见解，并为量身定制的干预措施指明目标。然而，使用标准统计方法识别相关亚组可能具有挑战性。

正文

我们回顾了关于决策树的文献，决策树是一类基于协变量将人群划分为具有相似结果变量值的不同亚组的技术。我们比较了两种决策树方法，即流行的分类与回归树（CART）技术和较新的条件推断树（CTree）技术，在模拟研究中评估它们的性能，并使用盒装午餐研究的数据，该研究是一项关于分量干预的随机对照试验。当数据中真正存在亚组时，CART和CTree都能识别出同质的人群亚组，并且相对于基于回归的方法能提供更高的预测准确性。CART和CTree之间的一个重要区别在于，后者在构建决策树时使用了正式的统计假设检验框架，这简化了识别和解释最终树模型的过程。我们还介绍了一种可视化决策树定义的亚组的新方法。我们新颖的图形可视化提供了对决策树识别出的亚组更具科学意义的表征。

结论

决策树是识别由个体特征组合定义的同质亚组的有用工具。虽然所有决策树技术都会生成亚组，但由于其简单性和易于解释性，我们提倡使用较新的CTree技术。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7534/5607590/d229920bfbe8/12982_2017_64_Fig1_HTML.jpg

相似文献

Decision trees in epidemiological research.

Emerg Themes Epidemiol. 2017 Sep 20;14:11. doi: 10.1186/s12982-017-0064-4. eCollection 2017.

Examining the use of decision trees in population health surveillance research: an application to youth mental health survey data in the COMPASS study.

Health Promot Chronic Dis Prev Can. 2023 Feb;43(2):73-86. doi: 10.24095/hpcdp.43.2.03.

Unbiased Recursive Partitioning Enables Robust and Reliable Outcome Prediction in Acute Spinal Cord Injury.

J Neurotrauma. 2022 Feb;39(3-4):266-276. doi: 10.1089/neu.2020.7407. Epub 2021 Apr 7.

Quantitative methods for descriptive intersectional analysis with binary health outcomes.

SSM Popul Health. 2022 Jan 22;17:101032. doi: 10.1016/j.ssmph.2022.101032. eCollection 2022 Mar.

Exploratory Data Mining Techniques (Decision Tree Models) for Examining the Impact of Internet-Based Cognitive Behavioral Therapy for Tinnitus: Machine Learning Approach.

J Med Internet Res. 2021 Nov 2;23(11):e28999. doi: 10.2196/28999.

The future of Cochrane Neonatal.

Early Hum Dev. 2020 Nov;150:105191. doi: 10.1016/j.earlhumdev.2020.105191. Epub 2020 Sep 12.

A novel approach to build accurate and diverse decision tree forest.

Evol Intell. 2022;15(1):439-453. doi: 10.1007/s12065-020-00519-0. Epub 2021 Jan 3.

Unbiased Recursive Partitioning to Stratify Patients with Acute Traumatic Spinal Cord Injuries: External Validity in an Observational Cohort Study.

J Neurotrauma. 2019 Sep 15;36(18):2732-2742. doi: 10.1089/neu.2018.6335. Epub 2019 Apr 10.

Regression Trees for Longitudinal Data with Baseline Covariates.

Biostat Epidemiol. 2019;3(1):1-22. doi: 10.1080/24709360.2018.1557797. Epub 2018 Dec 31.

Evolutionary algorithms and decision trees for predicting poor outcome after endovascular treatment for acute ischemic stroke.

Comput Biol Med. 2021 Jun;133:104414. doi: 10.1016/j.compbiomed.2021.104414. Epub 2021 Apr 21.

引用本文的文献

Co-occurrence of malaria and Chagas disease in the Brazilian Amazon: the need for integrated health surveillance.

Cad Saude Publica. 2025 Jun 9;41Suppl 1(Suppl 1):e00042124. doi: 10.1590/0102-311XEN042124. eCollection 2025.

Evaluating the sample size requirements of tree-based ensemble machine learning techniques for clinical risk prediction.

Stat Methods Med Res. 2025 Jul;34(7):1356-1372. doi: 10.1177/09622802251338983. Epub 2025 May 14.

Utilizing artificial intelligence to predict and analyze socioeconomic, environmental, and healthcare factors driving tuberculosis globally.

Sci Rep. 2025 Apr 19;15(1):13619. doi: 10.1038/s41598-025-96973-w.

Machine learning in physical activity, sedentary, and sleep behavior research.

J Act Sedentary Sleep Behav. 2024 Jan 30;3(1):5. doi: 10.1186/s44167-024-00045-9.

Prognostic impact of late gadolinium enhancement granularity in non-ischemic dilated cardiomyopathy.

Eur Radiol. 2025 Feb 8. doi: 10.1007/s00330-025-11404-8.

Identifying intersectional groups at risk for missing breast cancer screening: Comparing regression- and decision tree-based approaches.

SSM Popul Health. 2024 Dec 9;29:101736. doi: 10.1016/j.ssmph.2024.101736. eCollection 2025 Mar.

BMJ Paediatr Open. 2024 Oct 10;8(1):e002885. doi: 10.1136/bmjpo-2024-002885.

Identifying combinations of long-term conditions associated with sarcopenia: a cross-sectional decision tree analysis in the UK Biobank study.

BMJ Open. 2024 Sep 5;14(9):e085204. doi: 10.1136/bmjopen-2024-085204.

Assessing the Value of Imaging Data in Machine Learning Models to Predict Patient-Reported Outcome Measures in Knee Osteoarthritis Patients.

Bioengineering (Basel). 2024 Aug 12;11(8):824. doi: 10.3390/bioengineering11080824.

Predictors of early child development for screening pregnant women most in need of support in Brazil.

J Glob Health. 2024 Aug 23;14:04143. doi: 10.7189/jogh.14.04143.

本文引用的文献

Trends in Obesity Among Adults in the United States, 2005 to 2014.

JAMA. 2016 Jun 7;315(21):2284-91. doi: 10.1001/jama.2016.6458.

Measuring wanting and liking from animals to humans: A systematic review.

Neurosci Biobehav Rev. 2016 Apr;63:124-42. doi: 10.1016/j.neubiorev.2016.01.006. Epub 2016 Feb 3.

An observational study identifying obese subgroups among older adults at increased risk of mobility disability: do perceptions of the neighborhood environment matter?

Int J Behav Nutr Phys Act. 2015 Dec 18;12:157. doi: 10.1186/s12966-015-0322-1.

Identifying risk profiles for childhood obesity using recursive partitioning based on individual, familial, and neighborhood environment factors.

Int J Behav Nutr Phys Act. 2015 Feb 15;12:17. doi: 10.1186/s12966-015-0175-7.

An application in identifying high-risk populations in alternative tobacco product use utilizing logistic regression and CART: a heuristic comparison.

BMC Public Health. 2015 Apr 9;15:341. doi: 10.1186/s12889-015-1582-z.

Risk profiles for weight gain among postmenopausal women: a classification and regression tree analysis approach.

PLoS One. 2015 Mar 30;10(3):e0121430. doi: 10.1371/journal.pone.0121430. eCollection 2015.

Associations between sleep parameters and food reward.

J Sleep Res. 2015 Jun;24(3):346-50. doi: 10.1111/jsr.12275. Epub 2015 Jan 23.

Classification and regression trees for epidemiologic research: an air pollution example.

Environ Health. 2014 Mar 13;13(1):17. doi: 10.1186/1476-069X-13-17.

Portion size effects on weight gain in a free living setting.

Obesity (Silver Spring). 2014 Jun;22(6):1400-5. doi: 10.1002/oby.20720. Epub 2014 Feb 19.

Questionnaire and laboratory measures of eating behavior. Associations with energy intake and BMI in a community sample of working adults.

Appetite. 2014 Jan;72:50-8. doi: 10.1016/j.appet.2013.09.020. Epub 2013 Oct 2.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

流行病学研究中的决策树

Decision trees in epidemiological research.

作者信息

机构信息

出版信息

BACKGROUND

MAIN TEXT

CONCLUSIONS

背景

正文

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献