Suppr超能文献

代谢组学数据的缺失值插补策略。

Missing value imputation strategies for metabolomics data.

作者信息

Armitage Emily Grace, Godzien Joanna, Alonso-Herranz Vanesa, López-Gonzálvez Ángeles, Barbas Coral

机构信息

Centre for Metabolomics and Bioanalysis (CEMBIO), Facultad de Farmacia, Universidad CEU San Pablo, Madrid, Spain.

出版信息

Electrophoresis. 2015 Dec;36(24):3050-60. doi: 10.1002/elps.201500352. Epub 2015 Oct 20.

Abstract

The origin of missing values can be caused by different reasons and depending on these origins missing values should be considered differently and dealt with in different ways. In this research, four methods of imputation have been compared with respect to revealing their effects on the normality and variance of data, on statistical significance and on the approximation of a suitable threshold to accept missing data as truly missing. Additionally, the effects of different strategies for controlling familywise error rate or false discovery and how they work with the different strategies for missing value imputation have been evaluated. Missing values were found to affect normality and variance of data and k-means nearest neighbour imputation was the best method tested for restoring this. Bonferroni correction was the best method for maximizing true positives and minimizing false positives and it was observed that as low as 40% missing data could be truly missing. The range between 40 and 70% missing values was defined as a "gray area" and therefore a strategy has been proposed that provides a balance between the optimal imputation strategy that was k-means nearest neighbor and the best approximation of positioning real zeros.

摘要

缺失值的产生可能由不同原因导致,基于这些原因,缺失值应被区别对待并采用不同方式处理。在本研究中,对四种插补方法进行了比较,以揭示它们对数据的正态性和方差、统计显著性以及接受缺失数据为真正缺失的合适阈值近似值的影响。此外,还评估了控制族系错误率或错误发现的不同策略的效果,以及它们如何与不同的缺失值插补策略协同工作。发现缺失值会影响数据的正态性和方差,而k均值最近邻插补是测试的恢复此情况的最佳方法。Bonferroni校正方法在最大化真阳性和最小化假阳性方面表现最佳,并且观察到低至40%的缺失数据可能是真正缺失的。40%至70%的缺失值范围被定义为“灰色区域”,因此提出了一种策略,该策略在k均值最近邻这一最优插补策略与定位真实零值的最佳近似之间取得平衡。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验