• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

非靶向代谢组学数据缺失值插补的工作流程

A Workflow for Missing Values Imputation of Untargeted Metabolomics Data.

作者信息

Faquih Tariq, van Smeden Maarten, Luo Jiao, le Cessie Saskia, Kastenmüller Gabi, Krumsiek Jan, Noordam Raymond, van Heemst Diana, Rosendaal Frits R, van Hylckama Vlieg Astrid, Willems van Dijk Ko, Mook-Kanamori Dennis O

机构信息

Department of Clinical Epidemiology, Leiden University Medical Center, Postal Zone C7-P, PO Box 9600, 2300 RC Leiden, The Netherlands.

Julius Center for Health Sciences and Primary Care, University Medical Centre Utrecht, Utrecht University, 8, 3584 Utrecht, The Netherlands.

出版信息

Metabolites. 2020 Nov 26;10(12):486. doi: 10.3390/metabo10120486.

DOI:10.3390/metabo10120486
PMID:33256233
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7761057/
Abstract

Metabolomics studies have seen a steady growth due to the development and implementation of affordable and high-quality metabolomics platforms. In large metabolite panels, measurement values are frequently missing and, if neglected or sub-optimally imputed, can cause biased study results. We provided a publicly available, user-friendly script to streamline the imputation of missing endogenous, unannotated, and xenobiotic metabolites. We evaluated the multivariate imputation by chained equations (MICE) and k-nearest neighbors (kNN) analyses implemented in our script by simulations using measured metabolites data from the Netherlands Epidemiology of Obesity (NEO) study ( = 599). We simulated missing values in four unique metabolites from different pathways with different correlation structures in three sample sizes (599, 150, 50) with three missing percentages (15%, 30%, 60%), and using two missing mechanisms (completely at random and not at random). Based on the simulations, we found that for MICE, larger sample size was the primary factor decreasing bias and error. For kNN, the primary factor reducing bias and error was the metabolite correlation with its predictor metabolites. MICE provided consistently higher performance measures particularly for larger datasets ( > 50). In conclusion, we presented an imputation workflow in a publicly available script to impute untargeted metabolomics data. Our simulations provided insight into the effects of sample size, percentage missing, and correlation structure on the accuracy of the two imputation methods.

摘要

由于经济实惠且高质量的代谢组学平台的开发与应用,代谢组学研究呈现出稳步增长的态势。在大型代谢物面板中,测量值常常缺失,如果被忽视或插补方法欠佳,可能会导致有偏差的研究结果。我们提供了一个公开可用且用户友好的脚本,以简化对内源性、未注释和外源性代谢物缺失值的插补。我们通过使用来自荷兰肥胖流行病学(NEO)研究(n = 599)的实测代谢物数据进行模拟,评估了我们脚本中实现的链式方程多元插补(MICE)和k近邻(kNN)分析。我们在三种样本量(599、150、50)、三种缺失百分比(15%、30%、60%)的情况下,使用两种缺失机制(完全随机和非随机),对来自不同途径且具有不同相关结构的四种独特代谢物模拟缺失值。基于模拟结果,我们发现对于MICE而言,较大的样本量是降低偏差和误差的主要因素。对于kNN,降低偏差和误差的主要因素是代谢物与其预测代谢物之间的相关性。MICE尤其在较大数据集(n > 50)时提供了始终更高的性能指标。总之,我们在一个公开可用的脚本中展示了一种插补工作流程,用于插补非靶向代谢组学数据。我们的模拟深入了解了样本量、缺失百分比和相关结构对两种插补方法准确性的影响。

相似文献

1
A Workflow for Missing Values Imputation of Untargeted Metabolomics Data.非靶向代谢组学数据缺失值插补的工作流程
Metabolites. 2020 Nov 26;10(12):486. doi: 10.3390/metabo10120486.
2
Characterization of missing values in untargeted MS-based metabolomics data and evaluation of missing data handling strategies.基于非靶向 MS 的代谢组学数据中缺失值的特征描述及缺失数据处理策略的评价。
Metabolomics. 2018 Sep 20;14(10):128. doi: 10.1007/s11306-018-1420-2.
3
NS-kNN: a modified k-nearest neighbors approach for imputing metabolomics data.NS-kNN:一种改进的 k-最近邻方法,用于代谢组学数据插补。
Metabolomics. 2018 Nov 23;14(12):153. doi: 10.1007/s11306-018-1451-8.
4
Advanced methods for missing values imputation based on similarity learning.基于相似性学习的缺失值插补先进方法。
PeerJ Comput Sci. 2021 Jul 21;7:e619. doi: 10.7717/peerj-cs.619. eCollection 2021.
5
Distribution based nearest neighbor imputation for truncated high dimensional data with applications to pre-clinical and clinical metabolomics studies.基于分布的最近邻插补法用于截断高维数据及其在临床前和临床代谢组学研究中的应用
BMC Bioinformatics. 2017 Feb 20;18(1):114. doi: 10.1186/s12859-017-1547-6.
6
Missing value imputation in high-dimensional phenomic data: imputable or not, and how?高维表型组数据中的缺失值插补:是否可插补以及如何插补?
BMC Bioinformatics. 2014 Nov 5;15(1):346. doi: 10.1186/s12859-014-0346-6.
7
The performance of prognostic models depended on the choice of missing value imputation algorithm: a simulation study.预后模型的性能取决于缺失值插补算法的选择:一项模拟研究。
J Clin Epidemiol. 2024 Dec;176:111539. doi: 10.1016/j.jclinepi.2024.111539. Epub 2024 Sep 24.
8
Mechanism-aware imputation: a two-step approach in handling missing values in metabolomics.基于机制的插补:代谢组学中处理缺失值的两步法。
BMC Bioinformatics. 2022 May 16;23(1):179. doi: 10.1186/s12859-022-04659-1.
9
Random forest-based imputation outperforms other methods for imputing LC-MS metabolomics data: a comparative study.基于随机森林的插补方法在 LC-MS 代谢组学数据插补方面优于其他方法:一项比较研究。
BMC Bioinformatics. 2019 Oct 11;20(1):492. doi: 10.1186/s12859-019-3110-0.
10
Missing Value Imputation Approach for Mass Spectrometry-based Metabolomics Data.基于质谱的代谢组学数据的缺失值插补方法。
Sci Rep. 2018 Jan 12;8(1):663. doi: 10.1038/s41598-017-19120-0.

引用本文的文献

1
Information-Content-Informed Kendall-tau Correlation Methodology: Interpreting Missing Values as Useful Information.信息内容告知的肯德尔tau相关性方法:将缺失值解释为有用信息。
bioRxiv. 2025 Jul 21:2022.02.24.481854. doi: 10.1101/2022.02.24.481854.
2
Development and internal verification of nomogram for forecasting delirium in the elderly admitted to intensive care units: an analysis of MIMIC-IV database.重症监护病房老年患者谵妄预测列线图的开发与内部验证:MIMIC-IV数据库分析
Front Neurol. 2025 May 13;16:1580125. doi: 10.3389/fneur.2025.1580125. eCollection 2025.
3
Description of metabolic differences between castrated males and intact gilts obtained from high-throughput metabolomics of porcine plasma.

本文引用的文献

1
Characterization of missing values in untargeted MS-based metabolomics data and evaluation of missing data handling strategies.基于非靶向 MS 的代谢组学数据中缺失值的特征描述及缺失数据处理策略的评价。
Metabolomics. 2018 Sep 20;14(10):128. doi: 10.1007/s11306-018-1420-2.
2
Using simulation studies to evaluate statistical methods.运用模拟研究评估统计方法。
Stat Med. 2019 May 20;38(11):2074-2102. doi: 10.1002/sim.8086. Epub 2019 Jan 16.
3
Variability of Two Metabolomic Platforms in CKD.两种代谢组学平台在慢性肾脏病中的变异性。
通过猪血浆的高通量代谢组学获得的去势公猪和未阉割后备母猪之间代谢差异的描述。
J Anim Sci. 2025 Jan 4;103. doi: 10.1093/jas/skaf178.
4
Robust Metabolomic Age Prediction Based on a Wide Selection of Metabolites.基于多种代谢物的稳健代谢组学年龄预测
J Gerontol A Biol Sci Med Sci. 2025 Feb 10;80(3). doi: 10.1093/gerona/glae280.
5
The Potential of Metabolomics in Colorectal Cancer Prognosis.代谢组学在结直肠癌预后评估中的潜力
Metabolites. 2024 Dec 15;14(12):708. doi: 10.3390/metabo14120708.
6
Development of a metabolomic risk score for exposure to traffic-related air pollution: A multi-cohort study.交通相关空气污染暴露的代谢组学风险评分的开发:一项多队列研究。
Environ Res. 2024 Dec 15;263(Pt 3):120172. doi: 10.1016/j.envres.2024.120172. Epub 2024 Oct 16.
7
Steroid Hormone Biosynthesis and Dietary Related Metabolites Associated with Excessive Daytime Sleepiness.类固醇激素生物合成及与日间过度嗜睡相关的饮食代谢物
medRxiv. 2024 Sep 13:2024.09.12.24313561. doi: 10.1101/2024.09.12.24313561.
8
Interpretable machine learning model for early prediction of delirium in elderly patients following intensive care unit admission: a derivation and validation study.用于重症监护病房收治的老年患者谵妄早期预测的可解释机器学习模型:一项推导与验证研究。
Front Med (Lausanne). 2024 May 17;11:1399848. doi: 10.3389/fmed.2024.1399848. eCollection 2024.
9
Untargeted Metabolomics and Body Mass in Adolescents: A Cross-Sectional and Longitudinal Analysis.青少年的非靶向代谢组学与体重:一项横断面和纵向分析
Metabolites. 2023 Jul 30;13(8):899. doi: 10.3390/metabo13080899.
10
Comprehensive Two-Dimensional Gas Chromatography as a Bioanalytical Platform for Drug Discovery and Analysis.全二维气相色谱作为药物发现与分析的生物分析平台
Pharmaceutics. 2023 Mar 31;15(4):1121. doi: 10.3390/pharmaceutics15041121.
Clin J Am Soc Nephrol. 2019 Jan 7;14(1):40-48. doi: 10.2215/CJN.07070618. Epub 2018 Dec 20.
4
Profound Perturbation of the Metabolome in Obesity Is Associated with Health Risk.肥胖症患者的代谢组发生深刻变化,与健康风险相关。
Cell Metab. 2019 Feb 5;29(2):488-500.e2. doi: 10.1016/j.cmet.2018.09.022. Epub 2018 Oct 11.
5
Missing Value Imputation Approach for Mass Spectrometry-based Metabolomics Data.基于质谱的代谢组学数据的缺失值插补方法。
Sci Rep. 2018 Jan 12;8(1):663. doi: 10.1038/s41598-017-19120-0.
6
Large Scale Metabolic Profiling identifies Novel Steroids linked to Rheumatoid Arthritis.大规模代谢组学分析鉴定出与类风湿关节炎相关的新型甾体类物质。
Sci Rep. 2017 Aug 22;7(1):9137. doi: 10.1038/s41598-017-05439-1.
7
Distribution based nearest neighbor imputation for truncated high dimensional data with applications to pre-clinical and clinical metabolomics studies.基于分布的最近邻插补法用于截断高维数据及其在临床前和临床代谢组学研究中的应用
BMC Bioinformatics. 2017 Feb 20;18(1):114. doi: 10.1186/s12859-017-1547-6.
8
Untargeted Metabolomics Strategies-Challenges and Emerging Directions.非靶向代谢组学策略——挑战与新兴方向。
J Am Soc Mass Spectrom. 2016 Dec;27(12):1897-1905. doi: 10.1007/s13361-016-1469-y. Epub 2016 Sep 13.
9
Non-targeted UHPLC-MS metabolomic data processing methods: a comparative investigation of normalisation, missing value imputation, transformation and scaling.非靶向超高效液相色谱-质谱代谢组学数据处理方法:归一化、缺失值插补、转换和缩放的比较研究
Metabolomics. 2016;12:93. doi: 10.1007/s11306-016-1030-9. Epub 2016 Apr 15.
10
Multiple Imputation for General Missing Data Patterns in the Presence of High-dimensional Data.高维数据存在时一般缺失数据模式的多重填补
Sci Rep. 2016 Feb 12;6:21689. doi: 10.1038/srep21689.