Suppr超能文献

基于多源异构数据的半监督城市雾霾污染预测

Semi-supervised urban haze pollution prediction based on multi-source heterogeneous data.

作者信息

Liu Zuhan, Wang Lili

机构信息

School of Information Engineering, Nanchang Institute of Technology, Nanchang, China.

College of Science, Nanchang Institute of Technology, Nanchang, China.

出版信息

Heliyon. 2024 Jun 19;10(12):e33332. doi: 10.1016/j.heliyon.2024.e33332. eCollection 2024 Jun 30.

Abstract

Particulate matter (PM) is defined by the Texas Commission on Environmental Quality (TCEQ) as "a mixture of solid particles and liquid droplets found in the air". These particles vary widely in size. Those particles that are less than 2.5 μm in aerodynamic diameter are known as Particulate Matter 2.5 or PM. Urban haze pollution represented by PM is becoming serious, so air pollution monitoring is very important. However, due to high cost, the number of air monitoring stations is limited. Our work focuses on integrating multi-source heterogeneous data of Nanchang, China, which includes Taxi track, human mobility, Road networks, Points of Interest (POIs), Meteorology (e.g., temperature, dew point, humidity, wind speed, wind direction, atmospheric pressure, weather activity, weather conditions) and PM forecast data of air monitoring stations. This research presents an innovative approach to air quality prediction by integrating the above data sets from various sources and utilizing diverse architectures in Nanchang City, China. So for that, semi-supervised learning techniques will be used, namely collaborative training algorithm Co-Training (Co-T), who further adjusting algorithm Tri-Training (Tri-T). The objective is to accurately estimate haze pollution by integrating and using these multi-source heterogeneous data. We achieved this for the first time by employing a semi-supervised co-training strategy to accurately estimate pollution levels after applying the U-air system to environmental data. In particular, the algorithm of U-Air system is reproduced on these highly diverse heterogeneous data of Nanchang City, and the semi-supervised learning Co-T and Tri-T are used to conduct more detailed urban haze pollution prediction. Compared with Co-T, which train time classifier (TC) and subspace classifier (SC) respectively from the separated spatio-temporal perspective, the Tri-T is more accurate with a and faster because of its testing accuracy up to 85.62 %. The forecast results also present the potential of the city multi-source heterogeneous data and the effectiveness of the semi-supervised learning. We hope that this synthesis will motivate atmospheric environmental officials, scientists, and environmentalists in China to explore machine learning technology for controlling the discharge of pollutants and environmental management.

摘要

德克萨斯州环境质量委员会(TCEQ)将颗粒物(PM)定义为“空气中发现的固体颗粒和液滴的混合物”。这些颗粒的大小差异很大。空气动力学直径小于2.5微米的颗粒被称为细颗粒物2.5或PM2.5。以PM2.5为代表的城市雾霾污染日益严重,因此空气污染监测非常重要。然而,由于成本高昂,空气监测站的数量有限。我们的工作重点是整合中国南昌的多源异构数据,这些数据包括出租车轨迹、人员流动、道路网络、兴趣点(POI)、气象数据(如温度、露点、湿度、风速、风向、大气压力、天气活动、天气状况)以及空气监测站的PM2.5预测数据。本研究提出了一种创新方法,通过整合来自各种来源的上述数据集并利用中国南昌市的多种架构来进行空气质量预测。为此,将使用半监督学习技术,即协同训练算法Co-Training(Co-T),以及进一步的调整算法Tri-Training(Tri-T)。目标是通过整合和使用这些多源异构数据来准确估计雾霾污染。我们首次通过采用半监督协同训练策略,在将U-air系统应用于环境数据后准确估计污染水平。特别是,在南昌市这些高度多样的异构数据上重现了U-Air系统的算法,并使用半监督学习Co-T和Tri-T进行更详细的城市雾霾污染预测。与分别从分离的时空角度训练时间分类器(TC)和子空间分类器(SC)的Co-T相比,Tri-T更准确且速度更快,因为其测试准确率高达85.62%。预测结果还展示了城市多源异构数据的潜力以及半监督学习的有效性。我们希望这种综合方法能够激励中国的大气环境官员、科学家和环保主义者探索机器学习技术,以控制污染物排放和进行环境管理。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3288/11252978/67ede1475491/gr1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验