• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

获得用于预测慢性阻塞性肺疾病的最准确、可解释模型:多元线性回归与机器学习方法的三角测量法

Obtaining the Most Accurate, Explainable Model for Predicting Chronic Obstructive Pulmonary Disease: Triangulation of Multiple Linear Regression and Machine Learning Methods.

作者信息

Kamis Arnold, Gadia Nidhi, Luo Zilin, Ng Shu Xin, Thumbar Mansi

机构信息

Brandeis International Business School, Brandeis University, Waltham, MA, United States.

出版信息

JMIR AI. 2024 Aug 29;3:e58455. doi: 10.2196/58455.

DOI:10.2196/58455
PMID:39207843
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11393512/
Abstract

BACKGROUND

Lung disease is a severe problem in the United States. Despite the decreasing rates of cigarette smoking, chronic obstructive pulmonary disease (COPD) continues to be a health burden in the United States. In this paper, we focus on COPD in the United States from 2016 to 2019.

OBJECTIVE

We gathered a diverse set of non-personally identifiable information from public data sources to better understand and predict COPD rates at the core-based statistical area (CBSA) level in the United States. Our objective was to compare linear models with machine learning models to obtain the most accurate and interpretable model of COPD.

METHODS

We integrated non-personally identifiable information from multiple Centers for Disease Control and Prevention sources and used them to analyze COPD with different types of methods. We included cigarette smoking, a well-known contributing factor, and race/ethnicity because health disparities among different races and ethnicities in the United States are also well known. The models also included the air quality index, education, employment, and economic variables. We fitted models with both multiple linear regression and machine learning methods.

RESULTS

The most accurate multiple linear regression model has variance explained of 81.1%, mean absolute error of 0.591, and symmetric mean absolute percentage error of 9.666. The most accurate machine learning model has variance explained of 85.7%, mean absolute error of 0.456, and symmetric mean absolute percentage error of 6.956. Overall, cigarette smoking and household income are the strongest predictor variables. Moderately strong predictors include education level and unemployment level, as well as American Indian or Alaska Native, Black, and Hispanic population percentages, all measured at the CBSA level.

CONCLUSIONS

This research highlights the importance of using diverse data sources as well as multiple methods to understand and predict COPD. The most accurate model was a gradient boosted tree, which captured nonlinearities in a model whose accuracy is superior to the best multiple linear regression. Our interpretable models suggest ways that individual predictor variables can be used in tailored interventions aimed at decreasing COPD rates in specific demographic and ethnographic communities. Gaps in understanding the health impacts of poor air quality, particularly in relation to climate change, suggest a need for further research to design interventions and improve public health.

摘要

背景

肺部疾病在美国是一个严重问题。尽管吸烟率在下降,但慢性阻塞性肺疾病(COPD)在美国仍然是一项健康负担。在本文中,我们聚焦于2016年至2019年美国的慢性阻塞性肺疾病。

目的

我们从公共数据源收集了一系列不同的不可识别个人身份的信息,以更好地理解和预测美国基于核心统计区(CBSA)层面的慢性阻塞性肺疾病发病率。我们的目标是比较线性模型和机器学习模型,以获得最准确且可解释的慢性阻塞性肺疾病模型。

方法

我们整合了来自多个疾病控制与预防中心来源的不可识别个人身份的信息,并使用不同类型的方法对慢性阻塞性肺疾病进行分析。我们纳入了吸烟这一众所周知的影响因素,以及种族/民族,因为美国不同种族和民族之间的健康差异也是广为人知的。模型还包括空气质量指数、教育程度、就业情况和经济变量。我们使用多元线性回归和机器学习方法拟合模型。

结果

最准确的多元线性回归模型的方差解释率为81.1%,平均绝对误差为0.591,对称平均绝对百分比误差为9.666。最准确的机器学习模型的方差解释率为85.7%,平均绝对误差为0.456,对称平均绝对百分比误差为6.956。总体而言,吸烟和家庭收入是最强的预测变量。中等强度的预测因素包括教育水平和失业水平,以及美国印第安人或阿拉斯加原住民、黑人以及西班牙裔人口百分比,所有这些均在CBSA层面进行衡量。

结论

本研究强调了使用多样化数据源以及多种方法来理解和预测慢性阻塞性肺疾病的重要性。最准确的模型是梯度提升树,它在一个准确性优于最佳多元线性回归的模型中捕捉到了非线性关系。我们的可解释模型提出了在针对特定人口统计学和人种学社区降低慢性阻塞性肺疾病发病率的定制干预措施中使用各个预测变量的方法。在理解空气质量差对健康的影响方面存在差距,尤其是与气候变化相关的影响,这表明需要进一步开展研究以设计干预措施并改善公共卫生。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/87a4/11393512/2209f012c89b/ai_v3i1e58455_fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/87a4/11393512/5d3d956a119e/ai_v3i1e58455_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/87a4/11393512/1b4943b5f9dc/ai_v3i1e58455_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/87a4/11393512/48fe25cea02a/ai_v3i1e58455_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/87a4/11393512/d1db3b64527a/ai_v3i1e58455_fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/87a4/11393512/8a483ff22456/ai_v3i1e58455_fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/87a4/11393512/2209f012c89b/ai_v3i1e58455_fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/87a4/11393512/5d3d956a119e/ai_v3i1e58455_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/87a4/11393512/1b4943b5f9dc/ai_v3i1e58455_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/87a4/11393512/48fe25cea02a/ai_v3i1e58455_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/87a4/11393512/d1db3b64527a/ai_v3i1e58455_fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/87a4/11393512/8a483ff22456/ai_v3i1e58455_fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/87a4/11393512/2209f012c89b/ai_v3i1e58455_fig6.jpg

相似文献

1
Obtaining the Most Accurate, Explainable Model for Predicting Chronic Obstructive Pulmonary Disease: Triangulation of Multiple Linear Regression and Machine Learning Methods.获得用于预测慢性阻塞性肺疾病的最准确、可解释模型:多元线性回归与机器学习方法的三角测量法
JMIR AI. 2024 Aug 29;3:e58455. doi: 10.2196/58455.
2
Mortality and Morbidity Effects of Long-Term Exposure to Low-Level PM, BC, NO, and O: An Analysis of European Cohorts in the ELAPSE Project.长期暴露于低水平 PM、BC、NO 和 O 对死亡率和发病率的影响:ELAPSE 项目中欧洲队列的分析。
Res Rep Health Eff Inst. 2021 Sep;2021(208):1-127.
3
Racial and Ethnic Disparities in Geographic Access to Autism Resources Across the US.美国各地自闭症资源获取的地理分布存在种族和民族差异。
JAMA Netw Open. 2023 Jan 3;6(1):e2251182. doi: 10.1001/jamanetworkopen.2022.51182.
4
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
5
6
An explainable artificial intelligence framework for risk prediction of COPD in smokers.用于预测吸烟者 COPD 风险的可解释人工智能框架。
BMC Public Health. 2023 Nov 6;23(1):2164. doi: 10.1186/s12889-023-17011-w.
7
Identification of Factors Associated With Variation in US County-Level Obesity Prevalence Rates Using Epidemiologic vs Machine Learning Models.使用流行病学模型与机器学习模型识别与美国县一级肥胖患病率变化相关的因素。
JAMA Netw Open. 2019 Apr 5;2(4):e192884. doi: 10.1001/jamanetworkopen.2019.2884.
8
Explainable Machine Learning Model for Predicting First-Time Acute Exacerbation in Patients with Chronic Obstructive Pulmonary Disease.用于预测慢性阻塞性肺疾病患者首次急性加重的可解释机器学习模型
J Pers Med. 2022 Feb 7;12(2):228. doi: 10.3390/jpm12020228.
9
Novel biomarker genes which distinguish between smokers and chronic obstructive pulmonary disease patients with machine learning approach.利用机器学习方法区分吸烟者和慢性阻塞性肺疾病患者的新型生物标志物基因。
BMC Pulm Med. 2020 Feb 3;20(1):29. doi: 10.1186/s12890-020-1062-9.
10
Surveillance of health status in minority communities - Racial and Ethnic Approaches to Community Health Across the U.S. (REACH U.S.) Risk Factor Survey, United States, 2009.少数民族社区健康状况监测 - 美国全民族族裔社区健康方法(REACH US)风险因素调查,2009 年美国。
MMWR Surveill Summ. 2011 May 20;60(6):1-44.

本文引用的文献

1
A retrospective study on machine learning-assisted stroke recognition for medical helpline calls.一项关于医疗求助热线电话的机器学习辅助中风识别的回顾性研究。
NPJ Digit Med. 2023 Dec 19;6(1):235. doi: 10.1038/s41746-023-00980-y.
2
Digital solutions for decision support in general practice - a rapid review focused on systems developed for the universal healthcare setting in Denmark.数字解决方案在全科医疗决策支持中的应用——一项针对丹麦全民医保体系下开发系统的快速综述
BMC Prim Care. 2023 Dec 14;24(1):276. doi: 10.1186/s12875-023-02234-y.
3
Digital Health for Migrants, Ethnic and Cultural Minorities and the Role of Participatory Development: A Scoping Review.
移民、少数民族和文化少数群体的数字健康以及参与式发展的作用:范围综述。
Int J Environ Res Public Health. 2023 Oct 23;20(20):6962. doi: 10.3390/ijerph20206962.
4
Equity in digital healthcare - the case of Denmark.数字医疗保健中的公平性——以丹麦为例。
Front Public Health. 2023 Sep 6;11:1225222. doi: 10.3389/fpubh.2023.1225222. eCollection 2023.
5
Smoking cessation after diagnosis of COPD is associated with lower all-cause and cause-specific mortality: a nationwide population-based cohort study of South Korean men.COPD 诊断后戒烟与全因和特定原因死亡率降低相关:一项针对韩国男性的全国基于人群队列研究。
BMC Pulm Med. 2023 Jul 3;23(1):237. doi: 10.1186/s12890-023-02533-1.
6
Association of rural living with COPD-related hospitalizations and deaths in US veterans.农村居住与美国退伍军人 COPD 相关住院和死亡的关联。
Sci Rep. 2023 May 16;13(1):7887. doi: 10.1038/s41598-023-34865-7.
7
Engaging communities in addressing air quality: a scoping review.参与社区解决空气质量问题:范围综述。
Environ Health. 2022 Sep 19;21(1):89. doi: 10.1186/s12940-022-00896-2.
8
Interpretability and fairness evaluation of deep learning models on MIMIC-IV dataset.深度学习模型在 MIMIC-IV 数据集上的可解释性和公平性评估。
Sci Rep. 2022 May 3;12(1):7166. doi: 10.1038/s41598-022-11012-2.
9
Racial and Ethnic Minorities Have a Lower Prevalence of Airflow Obstruction than Non-Hispanic Whites.少数族裔的气流阻塞患病率低于非西班牙裔白人。
COPD. 2022;19(1):61-68. doi: 10.1080/15412555.2022.2029384. Epub 2022 Jan 31.
10
Urban air pollution control policies and strategies: a systematic review.城市空气污染控制政策与策略:一项系统综述
J Environ Health Sci Eng. 2021 Oct 8;19(2):1911-1940. doi: 10.1007/s40201-021-00744-4. eCollection 2021 Dec.