Bellinger Colin, Mohomed Jabbar Mohomed Shazan, Zaïane Osmar, Osornio-Vargas Alvaro
Department of Computing Science, University of Alberta, Edmonton, Canada.
Department of Paediatrics, University of Alberta, Edmonto, Canada.
BMC Public Health. 2017 Nov 28;17(1):907. doi: 10.1186/s12889-017-4914-3.
Data measuring airborne pollutants, public health and environmental factors are increasingly being stored and merged. These big datasets offer great potential, but also challenge traditional epidemiological methods. This has motivated the exploration of alternative methods to make predictions, find patterns and extract information. To this end, data mining and machine learning algorithms are increasingly being applied to air pollution epidemiology.
We conducted a systematic literature review on the application of data mining and machine learning methods in air pollution epidemiology. We carried out our search process in PubMed, the MEDLINE database and Google Scholar. Research articles applying data mining and machine learning methods to air pollution epidemiology were queried and reviewed.
Our search queries resulted in 400 research articles. Our fine-grained analysis employed our inclusion/exclusion criteria to reduce the results to 47 articles, which we separate into three primary areas of interest: 1) source apportionment; 2) forecasting/prediction of air pollution/quality or exposure; and 3) generating hypotheses. Early applications had a preference for artificial neural networks. In more recent work, decision trees, support vector machines, k-means clustering and the APRIORI algorithm have been widely applied. Our survey shows that the majority of the research has been conducted in Europe, China and the USA, and that data mining is becoming an increasingly common tool in environmental health. For potential new directions, we have identified that deep learning and geo-spacial pattern mining are two burgeoning areas of data mining that have good potential for future applications in air pollution epidemiology.
We carried out a systematic review identifying the current trends, challenges and new directions to explore in the application of data mining methods to air pollution epidemiology. This work shows that data mining is increasingly being applied in air pollution epidemiology. The potential to support air pollution epidemiology continues to grow with advancements in data mining related to temporal and geo-spacial mining, and deep learning. This is further supported by new sensors and storage mediums that enable larger, better quality data. This suggests that many more fruitful applications can be expected in the future.
测量空气污染物、公共卫生和环境因素的数据正越来越多地被存储和整合。这些大型数据集具有巨大潜力,但也对传统流行病学方法构成挑战。这促使人们探索替代方法来进行预测、发现模式和提取信息。为此,数据挖掘和机器学习算法越来越多地应用于空气污染流行病学。
我们对数据挖掘和机器学习方法在空气污染流行病学中的应用进行了系统的文献综述。我们在PubMed、MEDLINE数据库和谷歌学术上进行了搜索。查询并审查了将数据挖掘和机器学习方法应用于空气污染流行病学的研究文章。
我们的搜索查询得到了400篇研究文章。我们的细粒度分析采用纳入/排除标准将结果减少到47篇文章,我们将其分为三个主要感兴趣领域:1)源解析;2)空气污染/质量或暴露的预测;3)生成假设。早期应用偏好人工神经网络。在最近的工作中,决策树、支持向量机、k均值聚类和APRIORI算法得到了广泛应用。我们的调查表明,大多数研究是在欧洲、中国和美国进行的,并且数据挖掘正成为环境卫生中越来越常用的工具。对于潜在的新方向,我们发现深度学习和地理空间模式挖掘是数据挖掘的两个新兴领域,在空气污染流行病学的未来应用中具有良好潜力。
我们进行了一项系统综述,确定了数据挖掘方法在空气污染流行病学应用中的当前趋势、挑战和新方向。这项工作表明,数据挖掘在空气污染流行病学中的应用越来越多。随着与时间和地理空间挖掘以及深度学习相关的数据挖掘技术的进步,支持空气污染流行病学的潜力持续增长。新的传感器和存储介质能够提供更大、质量更好的数据,这进一步支持了这一点。这表明未来有望出现更多富有成效的应用。