Suppr超能文献

生物考古学中混合数据集的插补方法。

Imputation methods for mixed datasets in bioarchaeology.

作者信息

Ryan-Despraz Jessica, Wissler Amanda

机构信息

Department of Physical Anthropology, University of Bern, Bern, Switzerland.

Department of Anthropology, McMaster University, Hamilton, Canada.

出版信息

Archaeol Anthropol Sci. 2024;16(11):187. doi: 10.1007/s12520-024-02078-2. Epub 2024 Oct 23.

Abstract

UNLABELLED

Missing data is a prevalent problem in bioarchaeological research and imputation could provide a promising solution. This work simulated missingness on a control dataset (481 samples × 41 variables) in order to explore imputation methods for mixed data (qualitative and quantitative data). The tested methods included Random Forest (RF), PCA/MCA, factorial analysis for mixed data (FAMD), hotdeck, predictive mean matching (PMM), random samples from observed values (RSOV), and a multi-method (MM) approach for the three missingness mechanisms (MCAR, MAR, and MNAR) at levels of 5%, 10%, 20%, 30%, and 40% missingness. This study also compared single imputation with an adapted multiple imputation method derived from the R package "mice". The results showed that the adapted multiple imputation technique always outperformed single imputation for the same method. The best performing methods were most often RF and MM, and other commonly successful methods were PCA/MCA and PMM multiple imputation. Across all criteria, the amount of missingness was the most important parameter for imputation accuracy. While this study found that some imputation methods performed better than others for the control dataset, each imputation method has advantages and disadvantages. Imputation remains a promising solution for datasets containing missingness; however when making a decision it is essential to consider dataset structure and research goals.

SUPPLEMENTARY INFORMATION

The online version contains supplementary material available at 10.1007/s12520-024-02078-2.

摘要

未标注

缺失数据是生物考古学研究中普遍存在的问题,插补法可能提供一个有前景的解决方案。这项工作在一个对照数据集(481个样本×41个变量)上模拟缺失情况,以探索混合数据(定性和定量数据)的插补方法。测试的方法包括随机森林(RF)、主成分分析/对应分析(PCA/MCA)、混合数据因子分析(FAMD)、热卡填充、预测均值匹配(PMM)、从观测值中随机抽样(RSOV),以及针对三种缺失机制(完全随机缺失、随机缺失、非随机缺失)在5%、10%、20%、30%和40%缺失水平下的多方法(MM)方法。本研究还将单一插补与从R包“mice”衍生的一种改进的多重插补方法进行了比较。结果表明,对于相同的方法,改进的多重插补技术总是优于单一插补。表现最佳的方法通常是RF和MM,其他常用的成功方法是PCA/MCA和PMM多重插补。在所有标准中,缺失量是插补准确性的最重要参数。虽然本研究发现某些插补方法在对照数据集上比其他方法表现更好,但每种插补方法都有优缺点。对于包含缺失值的数据集,插补仍然是一个有前景的解决方案;然而,在做决定时,考虑数据集结构和研究目标至关重要。

补充信息

在线版本包含可在10.1007/s12520-024-02078-2获取的补充材料。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验