Madokoro Hirokazu, Nix Stephanie
Faculty of Software and Information Science, Iwate Prefectural University, Takizawa 020-0693, Iwate, Japan.
Sensors (Basel). 2025 Jun 29;25(13):4053. doi: 10.3390/s25134053.
This paper presents a novel approach for predicting Particulate Matter (PM) concentrations using mobile camera devices. In response to persistent air pollution challenges across Japan, we developed a system that utilizes cutting-edge transformer-based deep learning architectures to estimate PM values from imagery captured by smartphone cameras. Our approach employs Contrastive Language-Image Pre-Training (CLIP) as a multimodal framework to extract visual features associated with PM concentration from environmental scenes. We first developed a baseline through comparative analysis of time-series models for 1D PM signal prediction, finding that linear models, particularly NLinear, outperformed complex transformer architectures for short-term forecasting tasks. Building on these insights, we implemented a CLIP-based system for 2D image analysis that achieved a Top-1 accuracy of 0.24 and a Top-5 accuracy of 0.52 when tested on diverse smartphone-captured images. The performance evaluations on Graphics Processing Unit (GPU) and Single-Board Computer (SBC) platforms highlight a viable path toward edge deployment. Processing times of 0.29 s per image on the GPU versus 2.68 s on the SBC demonstrate the potential for scalable, real-time environmental monitoring. We consider that this research connects high-performance computing with energy-efficient hardware solutions, creating a practical framework for distributed environmental monitoring that reduces reliance on costly centralized monitoring systems. Our findings indicate that transformer-based multimodal models present a promising approach for mobile sensing applications, with opportunities for further improvement through seasonal data expansion and architectural refinements.
本文提出了一种使用移动摄像设备预测颗粒物(PM)浓度的新方法。针对日本各地持续存在的空气污染挑战,我们开发了一种系统,该系统利用基于前沿Transformer的深度学习架构,从智能手机摄像头拍摄的图像中估计PM值。我们的方法采用对比语言-图像预训练(CLIP)作为多模态框架,从环境场景中提取与PM浓度相关的视觉特征。我们首先通过对一维PM信号预测的时间序列模型进行对比分析,建立了一个基线,发现线性模型,特别是NLinear,在短期预测任务中优于复杂的Transformer架构。基于这些见解,我们实现了一个基于CLIP的二维图像分析系统,在对各种智能手机拍摄的图像进行测试时,其Top-1准确率达到0.24,Top-5准确率达到0.52。在图形处理单元(GPU)和单板计算机(SBC)平台上的性能评估突出了一条可行的边缘部署路径。GPU上每张图像的处理时间为0.29秒,而SBC上为2.68秒,这表明了可扩展的实时环境监测的潜力。我们认为,这项研究将高性能计算与节能硬件解决方案联系起来,创建了一个分布式环境监测的实用框架,减少了对昂贵的集中监测系统的依赖。我们的研究结果表明,基于Transformer的多模态模型为移动传感应用提供了一种很有前景的方法,通过季节性数据扩展和架构优化有进一步改进的机会。