Piekarczyk Marcin, Hachaj Tomasz
Faculty of Electrical Engineering, Automatics, Computer Science and Biomedical Engineering, AGH University of Krakow, Al. Mickiewicza 30, 30-059 Krakow, Poland.
Sensors (Basel). 2024 Mar 13;24(6):1835. doi: 10.3390/s24061835.
In this paper we propose the method for detecting potential anomalous cosmic ray particle tracks in big data image dataset acquired by Complementary Metal-Oxide-Semiconductors (CMOS). Those sensors are part of scientific infrastructure of Cosmic Ray Extremely Distributed Observatory (CREDO). The use of Incremental PCA (Principal Components Analysis) allowed approximation of loadings which might be updated at runtime. Incremental PCA with Sequential Karhunen-Loeve Transform results with almost identical embedding as basic PCA. Depending on image preprocessing method the weighted distance between coordinate frame and its approximation was at the level from 0.01 to 0.02 radian for batches with size of 10,000 images. This significantly reduces the necessary calculations in terms of memory complexity so that our method can be used for big data. The use of intuitive parameters of the potential anomalies detection algorithm based on object density in embedding space makes our method intuitive to use. The sets of anomalies returned by our proposed algorithm do not contain any typical morphologies of particle tracks shapes. Thus, one can conclude that our proposed method effectively filter-off typical (in terms of analysis of variance) shapes of particle tracks by searching for those that can be treated as significantly different from the others in the dataset. We also proposed method that can be used to find similar objects, which gives it the potential, for example, to be used in minimal distance-based classification and CREDO image database querying. The proposed algorithm was tested on more than half a million (570,000+) images that contains various morphologies of cosmic particle tracks. To our knowledge, this is the first study of this kind based on data collected using a distributed network of CMOS sensors embedded in the cell phones of participants collaborating within the citizen science paradigm.
在本文中,我们提出了一种方法,用于检测由互补金属氧化物半导体(CMOS)获取的大数据图像数据集中潜在的异常宇宙射线粒子轨迹。这些传感器是宇宙射线极广分布天文台(CREDO)科学基础设施的一部分。使用增量主成分分析(PCA)可以近似估计可能在运行时更新的载荷。带有顺序卡尔胡宁 - 勒夫变换的增量PCA产生的嵌入结果与基本PCA几乎相同。根据图像预处理方法,对于大小为10,000张图像的批次,坐标框架与其近似值之间的加权距离在0.01至0.02弧度之间。这在内存复杂度方面显著减少了必要的计算量,因此我们的方法可用于大数据。基于嵌入空间中对象密度的潜在异常检测算法使用直观的参数,使我们的方法易于使用。我们提出的算法返回的异常集不包含任何典型的粒子轨迹形状形态。因此,可以得出结论,我们提出的方法通过搜索数据集中与其他形状有显著差异的形状,有效地滤除了典型(在方差分析方面)的粒子轨迹形状。我们还提出了可用于查找相似对象的方法,这使其具有例如用于基于最小距离的分类和CREDO图像数据库查询的潜力。所提出的算法在超过五十万(570,000 +)张包含各种宇宙粒子轨迹形态的图像上进行了测试。据我们所知,这是基于公民科学范式下参与者手机中嵌入的CMOS传感器分布式网络收集的数据进行的此类首次研究。