Sadowska Maria, Gajowniczek Krzysztof
Institute of Information Technology, Warsaw University of Life Sciences-SGGW, 02-787 Warszawa, Poland.
Entropy (Basel). 2025 Jun 12;27(6):624. doi: 10.3390/e27060624.
This paper discusses the results of a study investigating how input data characteristics affect the performance of time-series classification models. In this experiment, we used 82 synthetically generated time-series datasets, created based on predefined functions with added noise. These datasets varied in structure, including differences in the number of classes and noise levels, while maintaining a consistent length and total number of observations. This design allowed us to systematically assess the influence of dataset characteristics on classification outcomes. Seven classification models were evaluated and their performance was compared using accuracy metrics, training time and memory requirements. According to the evaluation, the CNN Classifier achieved the best results, demonstrating the highest robustness to an increasing number of classes and noise. In contrast, the least effective model was the Catch22 Classifier. Overall, the performed research leads to the conclusion that as the number of classes and the level of noise in the data increase, all classification models become less effective, achieving lower accuracy metrics.
本文讨论了一项研究的结果,该研究调查了输入数据特征如何影响时间序列分类模型的性能。在本实验中,我们使用了82个合成生成的时间序列数据集,这些数据集基于预定义函数并添加噪声创建。这些数据集在结构上有所不同,包括类别数量和噪声水平的差异,同时保持一致的长度和观测总数。这种设计使我们能够系统地评估数据集特征对分类结果的影响。评估了七个分类模型,并使用准确率指标、训练时间和内存需求对它们的性能进行了比较。根据评估,卷积神经网络(CNN)分类器取得了最佳结果,对不断增加的类别数量和噪声表现出最高的鲁棒性。相比之下,效果最差的模型是Catch22分类器。总体而言,所进行的研究得出结论,随着数据中类别数量和噪声水平的增加,所有分类模型的效果都会变差,准确率指标降低。