Alrawajfi Ala, Ismail Mohd Tahir, Al Wadi Sadam, Atiewi Saleh, Awajan Ahmad
School of Mathematical Science, Universiti Sains Malaysia, Penang, Penang, Malaysia.
Department of Financial and Administrative Sciences, Ma'an College, Al-Balqa Applied University, Maan, Maan, Jordan.
PeerJ Comput Sci. 2024 Sep 25;10:e2337. doi: 10.7717/peerj-cs.2337. eCollection 2024.
Data imputation strategies are necessary to address the prevalent difficulty of missing values in data observation and recording operations. This work utilizes diverse imputation methods to forecast and complete absent values inside a financial time-series dataset, specifically the daily prices of gold. The predictive accuracy of imputed data is assessed in comparison to the original entire dataset to ensure its robustness. The imputation methods are validated using actual closing price data obtained from a daily gold price website. The examined approaches include mean imputation, k-nearest neighbor (KNN), hot deck, random forest, support vector machine (SVM), and spline imputation. Their performance is evaluated based on several metrics, including mean error (ME), mean absolute error (MAE), root mean square error (RMSE), mean percentage error (MPE), and mean absolute percentage error (MAPE). The results indicate that the KNN approach consistently performs better than other methods in terms of all accuracy measures. Nevertheless, the precision of all techniques decreases as the proportion of missing data rises. Therefore, the KNN approach is suggested because to its exceptional performance and dependability in imputation tasks.
数据插补策略对于解决数据观测和记录操作中普遍存在的缺失值难题至关重要。这项工作运用多种插补方法来预测并补齐金融时间序列数据集中的缺失值,具体而言是黄金的每日价格。将插补后数据的预测准确性与原始完整数据集进行比较,以确保其稳健性。使用从每日黄金价格网站获取的实际收盘价数据对插补方法进行验证。所考察的方法包括均值插补、k近邻(KNN)、热卡插补、随机森林、支持向量机(SVM)和样条插补。基于多个指标对它们的性能进行评估,这些指标包括平均误差(ME)、平均绝对误差(MAE)、均方根误差(RMSE)、平均百分比误差(MPE)和平均绝对百分比误差(MAPE)。结果表明,在所有准确性度量方面,KNN方法始终比其他方法表现更好。然而,随着缺失数据比例的上升,所有技术的精度都会下降。因此,建议采用KNN方法,因为它在插补任务中表现卓越且可靠。