空气污染流行病学中数据挖掘与机器学习的系统综述。

A systematic review of data mining and machine learning for air pollution epidemiology.

作者信息

Bellinger Colin, Mohomed Jabbar Mohomed Shazan, Zaïane Osmar, Osornio-Vargas Alvaro

机构信息

Department of Computing Science, University of Alberta, Edmonton, Canada.

Department of Paediatrics, University of Alberta, Edmonto, Canada.

出版信息

BMC Public Health. 2017 Nov 28;17(1):907. doi: 10.1186/s12889-017-4914-3.

DOI:10.1186/s12889-017-4914-3

PMID:29179711

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5704396/

Abstract

BACKGROUND

Data measuring airborne pollutants, public health and environmental factors are increasingly being stored and merged. These big datasets offer great potential, but also challenge traditional epidemiological methods. This has motivated the exploration of alternative methods to make predictions, find patterns and extract information. To this end, data mining and machine learning algorithms are increasingly being applied to air pollution epidemiology.

METHODS

We conducted a systematic literature review on the application of data mining and machine learning methods in air pollution epidemiology. We carried out our search process in PubMed, the MEDLINE database and Google Scholar. Research articles applying data mining and machine learning methods to air pollution epidemiology were queried and reviewed.

RESULTS

Our search queries resulted in 400 research articles. Our fine-grained analysis employed our inclusion/exclusion criteria to reduce the results to 47 articles, which we separate into three primary areas of interest: 1) source apportionment; 2) forecasting/prediction of air pollution/quality or exposure; and 3) generating hypotheses. Early applications had a preference for artificial neural networks. In more recent work, decision trees, support vector machines, k-means clustering and the APRIORI algorithm have been widely applied. Our survey shows that the majority of the research has been conducted in Europe, China and the USA, and that data mining is becoming an increasingly common tool in environmental health. For potential new directions, we have identified that deep learning and geo-spacial pattern mining are two burgeoning areas of data mining that have good potential for future applications in air pollution epidemiology.

CONCLUSIONS

We carried out a systematic review identifying the current trends, challenges and new directions to explore in the application of data mining methods to air pollution epidemiology. This work shows that data mining is increasingly being applied in air pollution epidemiology. The potential to support air pollution epidemiology continues to grow with advancements in data mining related to temporal and geo-spacial mining, and deep learning. This is further supported by new sensors and storage mediums that enable larger, better quality data. This suggests that many more fruitful applications can be expected in the future.

摘要

背景

测量空气污染物、公共卫生和环境因素的数据正越来越多地被存储和整合。这些大型数据集具有巨大潜力，但也对传统流行病学方法构成挑战。这促使人们探索替代方法来进行预测、发现模式和提取信息。为此，数据挖掘和机器学习算法越来越多地应用于空气污染流行病学。

方法

我们对数据挖掘和机器学习方法在空气污染流行病学中的应用进行了系统的文献综述。我们在PubMed、MEDLINE数据库和谷歌学术上进行了搜索。查询并审查了将数据挖掘和机器学习方法应用于空气污染流行病学的研究文章。

结果

我们的搜索查询得到了400篇研究文章。我们的细粒度分析采用纳入/排除标准将结果减少到47篇文章，我们将其分为三个主要感兴趣领域：1）源解析；2）空气污染/质量或暴露的预测；3）生成假设。早期应用偏好人工神经网络。在最近的工作中，决策树、支持向量机、k均值聚类和APRIORI算法得到了广泛应用。我们的调查表明，大多数研究是在欧洲、中国和美国进行的，并且数据挖掘正成为环境卫生中越来越常用的工具。对于潜在的新方向，我们发现深度学习和地理空间模式挖掘是数据挖掘的两个新兴领域，在空气污染流行病学的未来应用中具有良好潜力。

结论

我们进行了一项系统综述，确定了数据挖掘方法在空气污染流行病学应用中的当前趋势、挑战和新方向。这项工作表明，数据挖掘在空气污染流行病学中的应用越来越多。随着与时间和地理空间挖掘以及深度学习相关的数据挖掘技术的进步，支持空气污染流行病学的潜力持续增长。新的传感器和存储介质能够提供更大、质量更好的数据，这进一步支持了这一点。这表明未来有望出现更多富有成效的应用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a362/5704396/2b9b75aff338/12889_2017_4914_Fig1_HTML.jpg

相似文献

A systematic review of data mining and machine learning for air pollution epidemiology.

BMC Public Health. 2017 Nov 28;17(1):907. doi: 10.1186/s12889-017-4914-3.

Individual-level interventions to reduce personal exposure to outdoor air pollution and their effects on people with long-term respiratory conditions.

Cochrane Database Syst Rev. 2021 Aug 9;8(8):CD013441. doi: 10.1002/14651858.CD013441.pub2.

Antidepressants for pain management in adults with chronic pain: a network meta-analysis.

Health Technol Assess. 2024 Oct;28(62):1-155. doi: 10.3310/MKRT2948.

Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.

Cochrane Database Syst Rev. 2021 Apr 19;4(4):CD011535. doi: 10.1002/14651858.CD011535.pub4.

Behavioral interventions to reduce risk for sexual transmission of HIV among men who have sex with men.

Cochrane Database Syst Rev. 2008 Jul 16(3):CD001230. doi: 10.1002/14651858.CD001230.pub2.

Comparison of cellulose, modified cellulose and synthetic membranes in the haemodialysis of patients with end-stage renal disease.

Cochrane Database Syst Rev. 2001(3):CD003234. doi: 10.1002/14651858.CD003234.

Survivor, family and professional experiences of psychosocial interventions for sexual abuse and violence: a qualitative evidence synthesis.

Cochrane Database Syst Rev. 2022 Oct 4;10(10):CD013648. doi: 10.1002/14651858.CD013648.pub2.

Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.

Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.

How lived experiences of illness trajectories, burdens of treatment, and social inequalities shape service user and caregiver participation in health and social care: a theory-informed qualitative evidence synthesis.

Health Soc Care Deliv Res. 2025 Jun;13(24):1-120. doi: 10.3310/HGTQ8159.

Home treatment for mental health problems: a systematic review.

Health Technol Assess. 2001;5(15):1-139. doi: 10.3310/hta5150.

引用本文的文献

Investigating the Consequences of Measurement Error of Gradually More Sophisticated Long-Term Personal Exposure Models in Assessing Health Effects: The London Study (MELONS).

Res Rep Health Eff Inst. 2025 May;2025(227):1-78.

Leveraging transformer models to predict cognitive impairment: accuracy, efficiency, and interpretability.

BMC Public Health. 2025 Feb 7;25(1):504. doi: 10.1186/s12889-025-21762-z.

Predictive modeling of air quality in the Tehran megacity via deep learning techniques.

Sci Rep. 2025 Jan 8;15(1):1367. doi: 10.1038/s41598-024-84550-6.

Application of the Lasso regularisation technique in mitigating overfitting in air quality prediction models.

Sci Rep. 2025 Jan 2;15(1):547. doi: 10.1038/s41598-024-84342-y.

Genetics, Epigenetics, and the Environment: Are Precision Medicine, Provider Compassion, and Social Justice Effective Public Health Measures to Mitigate Disease Risk and Severity?

Int J Environ Res Public Health. 2024 Nov 16;21(11):1522. doi: 10.3390/ijerph21111522.

Associations of multiple carotenoid co-exposure with all-cause and cause-specific mortality in US adults: a prospective cohort study.

Front Nutr. 2024 Aug 7;11:1415537. doi: 10.3389/fnut.2024.1415537. eCollection 2024.

Modern technologies and solutions to enhance surveillance and response systems for emerging zoonotic diseases.

Sci One Health. 2023 Dec 12;3:100061. doi: 10.1016/j.soh.2023.100061. eCollection 2024.

Integrating Machine Learning into Statistical Methods in Disease Risk Prediction Modeling: A Systematic Review.

Health Data Sci. 2024 Jul 23;4:0165. doi: 10.34133/hds.0165. eCollection 2024.

A Deep Learning Approach for Chromium Detection and Characterization from Soil Hyperspectral Data.

Toxics. 2024 May 11;12(5):357. doi: 10.3390/toxics12050357.

Prediction of tuberculosis clusters in the riverine municipalities of the Brazilian Amazon with machine learning.

Rev Bras Epidemiol. 2024 May 13;27:e240024. doi: 10.1590/1980-549720240024. eCollection 2024.

本文引用的文献

Using machine learning to identify air pollution exposure profiles associated with early cognitive skills among U.S. children.

Environ Pollut. 2017 Nov;230:730-740. doi: 10.1016/j.envpol.2017.07.023. Epub 2017 Jul 18.

Urban air quality forecasting based on multi-dimensional collaborative Support Vector Regression (SVR): A case study of Beijing-Tianjin-Shijiazhuang.

PLoS One. 2017 Jul 14;12(7):e0179763. doi: 10.1371/journal.pone.0179763. eCollection 2017.

Air Pollution Monitoring Design for Epidemiological Application in a Densely Populated City.

Int J Environ Res Public Health. 2017 Jun 25;14(7):686. doi: 10.3390/ijerph14070686.

COVARIATE-ADAPTIVE CLUSTERING OF EXPOSURES FOR AIR POLLUTION EPIDEMIOLOGY COHORTS.

Ann Appl Stat. 2017 Mar;11(1):93-113. doi: 10.1214/16-AOAS992. Epub 2017 Apr 8.

Identification of long-range transport pathways and potential sources of PM and PM in Beijing from 2014 to 2015.

J Environ Sci (China). 2017 Jun;56:214-229. doi: 10.1016/j.jes.2016.06.035. Epub 2016 Oct 29.

Analysis of correlation between pediatric asthma exacerbation and exposure to pollutant mixtures with association rule mining.

Artif Intell Med. 2016 Nov;74:44-52. doi: 10.1016/j.artmed.2016.11.003. Epub 2016 Nov 25.

Air Pollution Monitoring and Mining Based on Sensor Grid in London.

Sensors (Basel). 2008 Jun 1;8(6):3601-3623. doi: 10.3390/s80603601.

Differential respiratory health effects from the 2008 northern California wildfires: A spatiotemporal approach.

Environ Res. 2016 Oct;150:227-235. doi: 10.1016/j.envres.2016.06.012. Epub 2016 Jun 15.

Evaluating the performance of low cost chemical sensors for air pollution research.

Faraday Discuss. 2016 Jul 18;189:85-103. doi: 10.1039/c5fd00201j.

Refining Time-Activity Classification of Human Subjects Using the Global Positioning System.

PLoS One. 2016 Feb 26;11(2):e0148875. doi: 10.1371/journal.pone.0148875. eCollection 2016.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

空气污染流行病学中数据挖掘与机器学习的系统综述。

A systematic review of data mining and machine learning for air pollution epidemiology.

作者信息

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献