Department of Psychology, University of Kansas, Lawrence, Kansas, United States of America.
PLoS One. 2023 Oct 5;18(10):e0292341. doi: 10.1371/journal.pone.0292341. eCollection 2023.
There is considerable geographic heterogeneity in obesity prevalence across counties in the United States. Machine learning algorithms accurately predict geographic variation in obesity prevalence, but the models are often uninterpretable and viewed as a black-box.
The goal of this study is to extract knowledge from machine learning models for county-level variation in obesity prevalence.
This study shows the application of explainable artificial intelligence methods to machine learning models of cross-sectional obesity prevalence data collected from 3,142 counties in the United States. County-level features from 7 broad categories: health outcomes, health behaviors, clinical care, social and economic factors, physical environment, demographics, and severe housing conditions. Explainable methods applied to random forest prediction models include feature importance, accumulated local effects, global surrogate decision tree, and local interpretable model-agnostic explanations.
The results show that machine learning models explained 79% of the variance in obesity prevalence, with physical inactivity, diabetes, and smoking prevalence being the most important factors in predicting obesity prevalence.
Interpretable machine learning models of health behaviors and outcomes provide substantial insight into obesity prevalence variation across counties in the United States.
美国各县的肥胖症患病率存在相当大的地域差异。机器学习算法可以准确预测肥胖症患病率的地域变化,但这些模型往往不可解释,被视为黑箱。
本研究旨在从机器学习模型中提取有关肥胖症患病率的县际差异的知识。
本研究展示了可解释人工智能方法在从美国 3142 个县收集的横断面肥胖患病率数据的机器学习模型中的应用。县级特征分为 7 个广泛类别:健康结果、健康行为、临床护理、社会经济因素、物理环境、人口统计学和严重住房条件。应用于随机森林预测模型的可解释方法包括特征重要性、累积局部效应、全局替代决策树和局部可解释模型不可知解释。
结果表明,机器学习模型解释了肥胖症患病率变异的 79%,其中身体活动不足、糖尿病和吸烟流行率是预测肥胖症患病率的最重要因素。
健康行为和结果的可解释机器学习模型为了解美国各县肥胖症患病率的差异提供了重要的见解。