Department of Mathematics and Informatics, Ovidius University of Constanta, 124, Mamaia Bd., 900527 Constanta, Romania.
Department of Naval Electro-mechanics Systems, Mircea cel Batran Naval Academy, 1, Fulgerului Street, Romania.
Sci Total Environ. 2021 Jan 20;753:141993. doi: 10.1016/j.scitotenv.2020.141993. Epub 2020 Aug 26.
Generally, official statistical reports provide information on the pollution extent over a region using the average records from all the observation sites. In the outliers' presence, the average is not a good choice. Therefore, in this article, we propose two alternatives for replacing the average series by most significant regional series, obtained by two selection procedures. The first algorithm chooses the candidates to be utilized for the regional estimation of pollution by a data segmentation that provides the most representative value for a given time interval. Since the number of segments to be used should be prior introduced, the second algorithm proposes a version of the selection procedure based on the k-means algorithm. The performances of these methods are verified on three groups of series (carbon oxides, sulfur oxides, and nitrogen oxides) recorded in the EEA33 countries during a period of 28 years. Both algorithms give better results than the average series, in terms of mean standard errors (MSE) and mean absolute errors (MAE).
通常,官方统计报告使用所有观测点的平均记录来提供关于区域污染程度的信息。在存在异常值的情况下,平均值不是一个好的选择。因此,在本文中,我们提出了两种替代方法,即用两种选择程序获得的最重要的区域系列来替代平均值系列。第一个算法通过数据分段选择候选值,该数据分段为给定时间间隔提供最具代表性的值,从而进行污染的区域估计。由于要使用的段数应该事先引入,因此第二个算法提出了一种基于 K-均值算法的选择过程的版本。这两种方法的性能在 EEA33 国家在 28 年期间记录的三组系列(二氧化碳、二氧化硫和氮氧化物)上进行了验证。就均方误差 (MSE) 和平均绝对误差 (MAE) 而言,这两种算法的结果都优于平均值系列。