Liu Xiliang, Zhi Xiaoying, Zhou Tao, Zhao Liyou, Tian Li, Gao Ruoyun, Luo Jiashuo, Cui WenQiong, Wang Qi
Beijing University of Technology, Beijing, 100124, China.
Aerospace Engineering University, Beijing, 101416, China.
Sci Data. 2025 Jul 24;12(1):1288. doi: 10.1038/s41597-025-05591-8.
Urban air pollution poses a global health risk. This study presents the Airware-Haikou dataset, a robust resource for urban air pollution research, integrating multivariate time-series air quality monitoring data (MTSAM), Point of Interest (POI) data, and a public complaint corpus. The MTSAM, collected from 95 monitoring stations in Haikou, China, includes hourly measurements of six air pollutants and five meteorological factors. The data underwent rigorous pre-processing, including spatial-temporal interpolation and rebalancing, to ensure consistency and reliability. Using POI data and monitoring station coordinates, the MTSAM was segmented into four spatial-temporal subsets via cluster analysis, enabling detailed characterization of air quality dynamics. The public complaint corpus, extracted from the UIE model, serves as a baseline for post hoc interpretation of deep learning models, linking public sentiment with empirical air quality data. The Airware-Haikou dataset offers a comprehensive foundation for urban air pollution studies, while its validation model, DsRL-Net, significantly enhances the accuracy and reliability of pollution detection, advancing research in this critical field.
城市空气污染对全球健康构成风险。本研究展示了Airware-海口数据集,这是一个用于城市空气污染研究的强大资源,整合了多变量时间序列空气质量监测数据(MTSAM)、兴趣点(POI)数据和公众投诉语料库。MTSAM数据收集于中国海口的95个监测站,包括六种空气污染物和五种气象因素的每小时测量值。数据经过了严格的预处理,包括时空插值和重新平衡,以确保一致性和可靠性。利用POI数据和监测站坐标,通过聚类分析将MTSAM划分为四个时空子集,从而能够详细描述空气质量动态。从UIE模型中提取的公众投诉语料库,作为深度学习模型事后解释的基线,将公众情绪与实证空气质量数据联系起来。Airware-海口数据集为城市空气污染研究提供了全面的基础,而其验证模型DsRL-Net显著提高了污染检测的准确性和可靠性,推动了这一关键领域的研究。