• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

生物考古学中混合数据集的插补方法。

Imputation methods for mixed datasets in bioarchaeology.

作者信息

Ryan-Despraz Jessica, Wissler Amanda

机构信息

Department of Physical Anthropology, University of Bern, Bern, Switzerland.

Department of Anthropology, McMaster University, Hamilton, Canada.

出版信息

Archaeol Anthropol Sci. 2024;16(11):187. doi: 10.1007/s12520-024-02078-2. Epub 2024 Oct 23.

DOI:10.1007/s12520-024-02078-2
PMID:39450370
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11496361/
Abstract

UNLABELLED

Missing data is a prevalent problem in bioarchaeological research and imputation could provide a promising solution. This work simulated missingness on a control dataset (481 samples × 41 variables) in order to explore imputation methods for mixed data (qualitative and quantitative data). The tested methods included Random Forest (RF), PCA/MCA, factorial analysis for mixed data (FAMD), hotdeck, predictive mean matching (PMM), random samples from observed values (RSOV), and a multi-method (MM) approach for the three missingness mechanisms (MCAR, MAR, and MNAR) at levels of 5%, 10%, 20%, 30%, and 40% missingness. This study also compared single imputation with an adapted multiple imputation method derived from the R package "mice". The results showed that the adapted multiple imputation technique always outperformed single imputation for the same method. The best performing methods were most often RF and MM, and other commonly successful methods were PCA/MCA and PMM multiple imputation. Across all criteria, the amount of missingness was the most important parameter for imputation accuracy. While this study found that some imputation methods performed better than others for the control dataset, each imputation method has advantages and disadvantages. Imputation remains a promising solution for datasets containing missingness; however when making a decision it is essential to consider dataset structure and research goals.

SUPPLEMENTARY INFORMATION

The online version contains supplementary material available at 10.1007/s12520-024-02078-2.

摘要

未标注

缺失数据是生物考古学研究中普遍存在的问题,插补法可能提供一个有前景的解决方案。这项工作在一个对照数据集(481个样本×41个变量)上模拟缺失情况,以探索混合数据(定性和定量数据)的插补方法。测试的方法包括随机森林(RF)、主成分分析/对应分析(PCA/MCA)、混合数据因子分析(FAMD)、热卡填充、预测均值匹配(PMM)、从观测值中随机抽样(RSOV),以及针对三种缺失机制(完全随机缺失、随机缺失、非随机缺失)在5%、10%、20%、30%和40%缺失水平下的多方法(MM)方法。本研究还将单一插补与从R包“mice”衍生的一种改进的多重插补方法进行了比较。结果表明,对于相同的方法,改进的多重插补技术总是优于单一插补。表现最佳的方法通常是RF和MM,其他常用的成功方法是PCA/MCA和PMM多重插补。在所有标准中,缺失量是插补准确性的最重要参数。虽然本研究发现某些插补方法在对照数据集上比其他方法表现更好,但每种插补方法都有优缺点。对于包含缺失值的数据集,插补仍然是一个有前景的解决方案;然而,在做决定时,考虑数据集结构和研究目标至关重要。

补充信息

在线版本包含可在10.1007/s12520-024-02078-2获取的补充材料。

相似文献

1
Imputation methods for mixed datasets in bioarchaeology.生物考古学中混合数据集的插补方法。
Archaeol Anthropol Sci. 2024;16(11):187. doi: 10.1007/s12520-024-02078-2. Epub 2024 Oct 23.
2
Random forest-based imputation outperforms other methods for imputing LC-MS metabolomics data: a comparative study.基于随机森林的插补方法在 LC-MS 代谢组学数据插补方面优于其他方法:一项比较研究。
BMC Bioinformatics. 2019 Oct 11;20(1):492. doi: 10.1186/s12859-019-3110-0.
3
A real data-driven simulation strategy to select an imputation method for mixed-type trait data.一种基于真实数据驱动的选择混合类型性状数据插补方法的模拟策略。
PLoS Comput Biol. 2023 Mar 22;19(3):e1010154. doi: 10.1371/journal.pcbi.1010154. eCollection 2023 Mar.
4
Comparison of techniques for handling missing covariate data within prognostic modelling studies: a simulation study.预后建模研究中缺失协变量数据处理技术的比较:一项模拟研究。
BMC Med Res Methodol. 2010 Jan 19;10:7. doi: 10.1186/1471-2288-10-7.
5
Mechanism-aware imputation: a two-step approach in handling missing values in metabolomics.基于机制的插补:代谢组学中处理缺失值的两步法。
BMC Bioinformatics. 2022 May 16;23(1):179. doi: 10.1186/s12859-022-04659-1.
6
Comparison of imputation methods for handling missing covariate data when fitting a Cox proportional hazards model: a resampling study.缺失协变量数据处理的填补方法在 Cox 比例风险模型拟合中的比较:重抽样研究。
BMC Med Res Methodol. 2010 Dec 31;10:112. doi: 10.1186/1471-2288-10-112.
7
missForest with feature selection using binary particle swarm optimization improves the imputation accuracy of continuous data.使用二进制粒子群优化进行特征选择的 missForest 提高了连续数据的插补准确性。
Genes Genomics. 2022 Jun;44(6):651-658. doi: 10.1007/s13258-022-01247-8. Epub 2022 Apr 6.
8
Missing data in bioarchaeology II: A test of ordinal and continuous data imputation.生物考古学中的缺失数据 II:有序数据和连续数据插补的检验。
Am J Biol Anthropol. 2022 Nov;179(3):349-364. doi: 10.1002/ajpa.24614. Epub 2022 Sep 12.
9
Heckman imputation models for binary or continuous MNAR outcomes and MAR predictors.Heckman 插补模型用于二分类或连续 MNAR 结局和 MAR 预测因子。
BMC Med Res Methodol. 2018 Aug 31;18(1):90. doi: 10.1186/s12874-018-0547-1.
10
A Principled Approach to Characterize and Analyze Partially Observed Confounder Data from Electronic Health Records.一种用于表征和分析来自电子健康记录的部分观察到的混杂因素数据的原则性方法。
Clin Epidemiol. 2024 May 21;16:329-343. doi: 10.2147/CLEP.S436131. eCollection 2024.

本文引用的文献

1
LASSO Regression with Multiple Imputations for the Selection of Key Variables Affecting the Fatty Acid Profile of .用于选择影响[具体对象]脂肪酸谱的关键变量的多重填补套索回归
Mar Drugs. 2023 Sep 2;21(9):483. doi: 10.3390/md21090483.
2
Logistic regression vs. predictive mean matching for imputing binary covariates.Logistic 回归与预测均值匹配在二进制协变量插补中的比较。
Stat Methods Med Res. 2023 Nov;32(11):2172-2183. doi: 10.1177/09622802231198795. Epub 2023 Sep 26.
3
Evaluation of missing data imputation methods for human osteometric measurements.
人体测量学测量中缺失数据插补方法的评价。
Am J Biol Anthropol. 2023 Aug;181(4):666-676. doi: 10.1002/ajpa.24787. Epub 2023 May 31.
4
Missing data in bioarchaeology II: A test of ordinal and continuous data imputation.生物考古学中的缺失数据 II:有序数据和连续数据插补的检验。
Am J Biol Anthropol. 2022 Nov;179(3):349-364. doi: 10.1002/ajpa.24614. Epub 2022 Sep 12.
5
Asymptotic theory and inference of predictive mean matching imputation using a superpopulation model framework.基于超总体模型框架的预测均值匹配插补的渐近理论与推断
Scand Stat Theory Appl. 2020 Sep;47(3):839-861. doi: 10.1111/sjos.12429. Epub 2019 Nov 8.
6
Accuracy of random-forest-based imputation of missing data in the presence of non-normality, non-linearity, and interaction.基于随机森林的缺失数据插补在非正态性、非线性和交互作用存在下的准确性。
BMC Med Res Methodol. 2020 Jul 25;20(1):199. doi: 10.1186/s12874-020-01080-1.
7
Multiple imputation by predictive mean matching in cluster-randomized trials.基于预测均数匹配的多重填补在整群随机临床试验中的应用。
BMC Med Res Methodol. 2020 Mar 30;20(1):72. doi: 10.1186/s12874-020-00948-6.
8
What is your definition of Big Data? Researchers' understanding of the phenomenon of the decade.大数据的定义是什么?研究人员对这一十年现象的理解。
PLoS One. 2020 Feb 25;15(2):e0228987. doi: 10.1371/journal.pone.0228987. eCollection 2020.
9
An Introduction to Statistics - Data Types, Distributions and Summarizing Data.统计学导论——数据类型、分布与数据汇总
Indian J Crit Care Med. 2019 Jun;23(Suppl 2):S169-S170. doi: 10.5005/jp-journals-10071-23198.
10
Random Forest Missing Data Algorithms.随机森林缺失数据算法
Stat Anal Data Min. 2017 Dec;10(6):363-377. doi: 10.1002/sam.11348. Epub 2017 Jun 13.