• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过期望最大化算法进行缺失数据插补可以改进主成分分析,以得出生物标志物图谱和饮食模式。

Missing data imputation via the expectation-maximization algorithm can improve principal component analysis aimed at deriving biomarker profiles and dietary patterns.

机构信息

Centre of Excellence for Nutrition, North-West University, Potchefstroom, South Africa.

Centre of Excellence for Nutrition, North-West University, Potchefstroom, South Africa; Laboratory of Human Nutrition, Institute of Food, Nutrition and Health, ETH, Zurich, Switzerland.

出版信息

Nutr Res. 2020 Mar;75:67-76. doi: 10.1016/j.nutres.2020.01.001. Epub 2020 Jan 9.

DOI:10.1016/j.nutres.2020.01.001
PMID:32035304
Abstract

Principal component analysis (PCA) is a popular statistical tool. However, despite numerous advantages, the good practice of imputing missing data before PCA is not common. In the present work, we evaluated the hypothesis that the expectation-maximization (EM) algorithm for missing data imputation is a reliable and advantageous procedure when using PCA to derive biomarker profiles and dietary patterns. To this aim, we used numerical simulations aimed to mimic real data commonly observed in nutritional research. Finally, we showed the advantages and pitfalls of the EM algorithm for missing data imputation applied to plasma fatty acid concentrations and nutrient intakes from real data sets deriving from the US National Health and Nutrition Examination Survey. PCA applied to simulated data having missing values resulted in biased eigenvalues with respect to the original data set without missing values. The bias between the eigenvalues from the original set of data and from the data set with missing values increased with number of missing values and appeared as independent with respect to the correlation structure among variables. On the other hand, when data were imputed, the mean of the eigenvalues over the 10 missing imputation runs overlapped with the ones derived from the PCA applied to the original data set. These results were confirmed when real data sets from the National Health and Nutrition Examination Survey were analyzed. We accept the hypothesis that the EM algorithm for missing data imputation applied before PCA aimed to derive biochemical profiles and dietary patterns is an effective technique especially for relatively small sample sizes.

摘要

主成分分析(PCA)是一种流行的统计工具。然而,尽管有许多优点,但在进行 PCA 之前对缺失数据进行插补的良好实践并不常见。在本工作中,我们评估了以下假设:在使用 PCA 得出生物标志物图谱和膳食模式时,缺失数据的期望最大化(EM)算法插补是一种可靠且有利的方法。为此,我们使用了旨在模拟营养研究中常见的真实数据的数值模拟。最后,我们展示了 EM 算法插补缺失数据应用于真实数据集(源自美国国家健康和营养检查调查)中血浆脂肪酸浓度和营养素摄入量的优势和缺陷。将具有缺失值的模拟数据应用于 PCA 会导致相对于无缺失值的原始数据集的特征值产生偏差。特征值之间的偏差在原始数据集和具有缺失值的数据集之间随着缺失值的数量增加而增加,并且看起来与变量之间的相关结构无关。另一方面,当数据进行插补时,10 次缺失插补运行的特征值的平均值与从原始数据集应用 PCA 得出的特征值重叠。当分析来自国家健康和营养检查调查的真实数据集时,得到了这些结果。我们接受这样的假设:在进行 PCA 之前,应用于缺失数据插补的 EM 算法是一种有效的技术,特别是对于相对较小的样本量,旨在得出生化图谱和膳食模式。

相似文献

1
Missing data imputation via the expectation-maximization algorithm can improve principal component analysis aimed at deriving biomarker profiles and dietary patterns.通过期望最大化算法进行缺失数据插补可以改进主成分分析,以得出生物标志物图谱和饮食模式。
Nutr Res. 2020 Mar;75:67-76. doi: 10.1016/j.nutres.2020.01.001. Epub 2020 Jan 9.
2
A comparison of imputation techniques for handling missing data.处理缺失数据的插补技术比较。
West J Nurs Res. 2002 Nov;24(7):815-29. doi: 10.1177/019394502762477004.
3
The effect of simple imputations based on four variants of PCA methods on the quantiles of annual rainfall data.基于 PCA 方法的四种变体的简单插补对年降雨量数据分位数的影响。
Environ Monit Assess. 2018 Sep 4;190(10):569. doi: 10.1007/s10661-018-6913-y.
4
Handling Missing Data in the Short Form-12 Health Survey (SF-12): Concordance of Real Patient Data and Data Estimated by Missing Data Imputation Procedures.处理简明健康调查问卷(SF-12)中的缺失数据:真实患者数据与缺失数据插补程序估计数据的一致性。
Assessment. 2021 Oct;28(7):1785-1798. doi: 10.1177/1073191120952886. Epub 2020 Aug 30.
5
Treatment of nonignorable missing data when modeling unobserved heterogeneity with finite mixture models.使用有限混合模型对未观察到的异质性进行建模时处理不可忽略的缺失数据。
Biom J. 2017 Jan;59(1):159-171. doi: 10.1002/bimj.201500037. Epub 2016 Nov 2.
6
Probabilistic principal component analysis with expectation maximization (PPCA-EM) facilitates volume classification and estimates the missing data.概率主成分分析与期望最大化(PPCA-EM)有助于体积分类,并估计缺失数据。
J Struct Biol. 2010 Jul;171(1):18-30. doi: 10.1016/j.jsb.2010.04.002. Epub 2010 Apr 10.
7
Evaluation of missing data imputation methods for human osteometric measurements.人体测量学测量中缺失数据插补方法的评价。
Am J Biol Anthropol. 2023 Aug;181(4):666-676. doi: 10.1002/ajpa.24787. Epub 2023 May 31.
8
A Dynamic Model for Imputing Missing Medical Data: A Multiobjective Particle Swarm Optimization Algorithm.用于推断缺失医学数据的动态模型:一种多目标粒子群优化算法。
J Healthc Eng. 2021 Oct 8;2021:1203726. doi: 10.1155/2021/1203726. eCollection 2021.
9
Deep Learning Approach for Imputation of Missing Values in Actigraphy Data: Algorithm Development Study.深度学习方法在运动数据缺失值插补中的应用:算法开发研究。
JMIR Mhealth Uhealth. 2020 Jul 23;8(7):e16113. doi: 10.2196/16113.
10
Multiple imputation of completely missing repeated measures data within person from a complex sample: application to accelerometer data in the National Health and Nutrition Examination Survey.复杂样本中个体内完全缺失重复测量数据的多重填补:应用于国家健康与营养检查调查中的加速度计数据
Stat Med. 2016 Dec 10;35(28):5170-5188. doi: 10.1002/sim.7049. Epub 2016 Aug 2.

引用本文的文献

1
Degradation-aware neural imputation: Advancing decoding stability in brain machine interfaces.降解感知神经插补:提升脑机接口中的解码稳定性
APL Bioeng. 2025 Apr 16;9(2):026106. doi: 10.1063/5.0250296. eCollection 2025 Jun.
2
Comparison of principal component analysis algorithms for imputation in agrometeorological data in high dimension and reduced sample size.高维小样本农业气象数据插补的主成分分析算法比较
PLoS One. 2024 Dec 31;19(12):e0315574. doi: 10.1371/journal.pone.0315574. eCollection 2024.
3
Clinical and CSF single-cell profiling of post-COVID-19 cognitive impairment.
新冠后认知障碍的临床和脑脊液单细胞分析。
Cell Rep Med. 2024 May 21;5(5):101561. doi: 10.1016/j.xcrm.2024.101561. Epub 2024 May 13.
4
Comparison of the effects of imputation methods for missing data in predictive modelling of cohort study datasets.缺失数据插补方法对队列研究数据集预测建模效果的比较。
BMC Med Res Methodol. 2024 Feb 16;24(1):41. doi: 10.1186/s12874-024-02173-x.
5
An early prediction model for type 2 diabetes mellitus based on genetic variants and nongenetic risk factors in a Han Chinese cohort.基于汉族队列中遗传变异和非遗传风险因素的 2 型糖尿病早期预测模型。
Front Endocrinol (Lausanne). 2023 Oct 25;14:1279450. doi: 10.3389/fendo.2023.1279450. eCollection 2023.
6
Missing data imputation, prediction, and feature selection in diagnosis of vaginal prolapse.阴道脱垂诊断中的缺失数据插补、预测和特征选择。
BMC Med Res Methodol. 2023 Nov 6;23(1):259. doi: 10.1186/s12874-023-02079-0.
7
Staging of colorectal cancer using lipid biomarkers and machine learning.使用脂质生物标志物和机器学习对结直肠癌进行分期。
Metabolomics. 2023 Sep 20;19(10):84. doi: 10.1007/s11306-023-02049-z.
8
Binned Data Provide Better Imputation of Missing Time Series Data from Wearables.分箱数据可更好地对可穿戴设备中缺失时间序列数据进行插补。
Sensors (Basel). 2023 Jan 28;23(3):1454. doi: 10.3390/s23031454.
9
Artificial Intelligence Algorithm-Based Computed Tomography Image of Both Kidneys in Diagnosis of Renal Dysplasia.基于人工智能算法的双肾计算机断层摄影术图像在肾发育不良诊断中的应用。
Comput Math Methods Med. 2022 Jan 27;2022:5823720. doi: 10.1155/2022/5823720. eCollection 2022.
10
Higher CSF sTNFR1-related proteins associate with better prognosis in very early Alzheimer's disease.脑脊液中 sTNFR1 相关蛋白水平较高与极早期阿尔茨海默病的预后较好相关。
Nat Commun. 2021 Jun 28;12(1):4001. doi: 10.1038/s41467-021-24220-7.