• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

多重插补方法:以每日黄金价格为例的研究

Multiple imputation methods: a case study of daily gold price.

作者信息

Alrawajfi Ala, Ismail Mohd Tahir, Al Wadi Sadam, Atiewi Saleh, Awajan Ahmad

机构信息

School of Mathematical Science, Universiti Sains Malaysia, Penang, Penang, Malaysia.

Department of Financial and Administrative Sciences, Ma'an College, Al-Balqa Applied University, Maan, Maan, Jordan.

出版信息

PeerJ Comput Sci. 2024 Sep 25;10:e2337. doi: 10.7717/peerj-cs.2337. eCollection 2024.

DOI:10.7717/peerj-cs.2337
PMID:39678293
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11639230/
Abstract

Data imputation strategies are necessary to address the prevalent difficulty of missing values in data observation and recording operations. This work utilizes diverse imputation methods to forecast and complete absent values inside a financial time-series dataset, specifically the daily prices of gold. The predictive accuracy of imputed data is assessed in comparison to the original entire dataset to ensure its robustness. The imputation methods are validated using actual closing price data obtained from a daily gold price website. The examined approaches include mean imputation, k-nearest neighbor (KNN), hot deck, random forest, support vector machine (SVM), and spline imputation. Their performance is evaluated based on several metrics, including mean error (ME), mean absolute error (MAE), root mean square error (RMSE), mean percentage error (MPE), and mean absolute percentage error (MAPE). The results indicate that the KNN approach consistently performs better than other methods in terms of all accuracy measures. Nevertheless, the precision of all techniques decreases as the proportion of missing data rises. Therefore, the KNN approach is suggested because to its exceptional performance and dependability in imputation tasks.

摘要

数据插补策略对于解决数据观测和记录操作中普遍存在的缺失值难题至关重要。这项工作运用多种插补方法来预测并补齐金融时间序列数据集中的缺失值,具体而言是黄金的每日价格。将插补后数据的预测准确性与原始完整数据集进行比较,以确保其稳健性。使用从每日黄金价格网站获取的实际收盘价数据对插补方法进行验证。所考察的方法包括均值插补、k近邻(KNN)、热卡插补、随机森林、支持向量机(SVM)和样条插补。基于多个指标对它们的性能进行评估,这些指标包括平均误差(ME)、平均绝对误差(MAE)、均方根误差(RMSE)、平均百分比误差(MPE)和平均绝对百分比误差(MAPE)。结果表明,在所有准确性度量方面,KNN方法始终比其他方法表现更好。然而,随着缺失数据比例的上升,所有技术的精度都会下降。因此,建议采用KNN方法,因为它在插补任务中表现卓越且可靠。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8307/11639230/108ca6262c62/peerj-cs-10-2337-g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8307/11639230/99e3b7060ab5/peerj-cs-10-2337-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8307/11639230/2290dc92cc3a/peerj-cs-10-2337-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8307/11639230/ce8b6a71767c/peerj-cs-10-2337-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8307/11639230/097d0cfc3295/peerj-cs-10-2337-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8307/11639230/69389bdfd995/peerj-cs-10-2337-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8307/11639230/7bafcb35e3f8/peerj-cs-10-2337-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8307/11639230/7ae05b8dcb13/peerj-cs-10-2337-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8307/11639230/629360566903/peerj-cs-10-2337-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8307/11639230/fe6fa1b3f765/peerj-cs-10-2337-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8307/11639230/67fbc04f7f0a/peerj-cs-10-2337-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8307/11639230/108ca6262c62/peerj-cs-10-2337-g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8307/11639230/99e3b7060ab5/peerj-cs-10-2337-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8307/11639230/2290dc92cc3a/peerj-cs-10-2337-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8307/11639230/ce8b6a71767c/peerj-cs-10-2337-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8307/11639230/097d0cfc3295/peerj-cs-10-2337-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8307/11639230/69389bdfd995/peerj-cs-10-2337-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8307/11639230/7bafcb35e3f8/peerj-cs-10-2337-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8307/11639230/7ae05b8dcb13/peerj-cs-10-2337-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8307/11639230/629360566903/peerj-cs-10-2337-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8307/11639230/fe6fa1b3f765/peerj-cs-10-2337-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8307/11639230/67fbc04f7f0a/peerj-cs-10-2337-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8307/11639230/108ca6262c62/peerj-cs-10-2337-g011.jpg

相似文献

1
Multiple imputation methods: a case study of daily gold price.多重插补方法:以每日黄金价格为例的研究
PeerJ Comput Sci. 2024 Sep 25;10:e2337. doi: 10.7717/peerj-cs.2337. eCollection 2024.
2
Advanced methods for missing values imputation based on similarity learning.基于相似性学习的缺失值插补先进方法。
PeerJ Comput Sci. 2021 Jul 21;7:e619. doi: 10.7717/peerj-cs.619. eCollection 2021.
3
Comparison of the effects of imputation methods for missing data in predictive modelling of cohort study datasets.缺失数据插补方法对队列研究数据集预测建模效果的比较。
BMC Med Res Methodol. 2024 Feb 16;24(1):41. doi: 10.1186/s12874-024-02173-x.
4
Addressing Missing Data Challenges in Geriatric Health Monitoring: A Study of Statistical and Machine Learning Imputation Methods.应对老年健康监测中的数据缺失挑战:统计与机器学习插补方法研究
Sensors (Basel). 2025 Jan 21;25(3):614. doi: 10.3390/s25030614.
5
Handling Missing Data in COVID-19 Incidence Estimation: Secondary Data Analysis.处理 COVID-19 发病率估计中的缺失数据:二次数据分析。
JMIR Public Health Surveill. 2024 Aug 20;10:e53719. doi: 10.2196/53719.
6
Effects of missing data imputation methods on univariate blood pressure time series data analysis and forecasting with ARIMA and LSTM.缺失数据插补方法对使用ARIMA和LSTM进行单变量血压时间序列数据分析及预测的影响。
BMC Med Res Methodol. 2024 Dec 26;24(1):320. doi: 10.1186/s12874-024-02448-3.
7
A simulation study on missing data imputation for dichotomous variables using statistical and machine learning methods.使用统计和机器学习方法对二分类变量缺失数据进行插补的模拟研究。
Sci Rep. 2023 Jun 9;13(1):9432. doi: 10.1038/s41598-023-36509-2.
8
[Simulation study on missing data imputation methods for longitudinal data in cohort studies].队列研究中纵向数据缺失值插补方法的模拟研究
Zhonghua Liu Xing Bing Xue Za Zhi. 2021 Oct 10;42(10):1889-1894. doi: 10.3760/cma.j.cn112338-20201130-01363.
9
Distribution based nearest neighbor imputation for truncated high dimensional data with applications to pre-clinical and clinical metabolomics studies.基于分布的最近邻插补法用于截断高维数据及其在临床前和临床代谢组学研究中的应用
BMC Bioinformatics. 2017 Feb 20;18(1):114. doi: 10.1186/s12859-017-1547-6.
10
Binned Data Provide Better Imputation of Missing Time Series Data from Wearables.分箱数据可更好地对可穿戴设备中缺失时间序列数据进行插补。
Sensors (Basel). 2023 Jan 28;23(3):1454. doi: 10.3390/s23031454.

本文引用的文献

1
Missing data, part 2. Missing data mechanisms: Missing completely at random, missing at random, missing not at random, and why they matter.缺失数据,第2部分。缺失数据机制:完全随机缺失、随机缺失、非随机缺失,以及它们为何重要。
Am J Orthod Dentofacial Orthop. 2022 Jul;162(1):138-139. doi: 10.1016/j.ajodo.2022.04.001.
2
R-Ensembler: A greedy rough set based ensemble attribute selection algorithm with kNN imputation for classification of medical data.R-Ensembler:一种基于粗糙集的贪婪集成属性选择算法,具有 kNN 插补功能,用于医学数据的分类。
Comput Methods Programs Biomed. 2020 Feb;184:105122. doi: 10.1016/j.cmpb.2019.105122. Epub 2019 Oct 8.
3
Allowing for uncertainty due to missing and LOCF imputed outcomes in meta-analysis.
在荟萃分析中,由于缺失和 LOCF 推断的结局存在不确定性。
Stat Med. 2019 Feb 28;38(5):720-737. doi: 10.1002/sim.8009. Epub 2018 Oct 22.
4
Improving forecasting accuracy for stock market data using EMD-HW bagging.运用 EMD-HW 装袋提升股票市场数据预测准确性。
PLoS One. 2018 Jul 17;13(7):e0199582. doi: 10.1371/journal.pone.0199582. eCollection 2018.
5
Review: a gentle introduction to imputation of missing values.综述:缺失值插补的简要介绍
J Clin Epidemiol. 2006 Oct;59(10):1087-91. doi: 10.1016/j.jclinepi.2006.01.014. Epub 2006 Jul 11.