Suppr
超能文献

GMSimpute：一种用于在无标记质谱分析中插补缺失值的广义两步套索方法。

GMSimpute: a generalized two-step Lasso approach to impute missing values in label-free mass spectrum analysis.

作者信息

Li Qian, Fisher Kate, Meng Wenjun, Fang Bin, Welsh Eric, Haura Eric B, Koomen John M, Eschrich Steven A, Fridley Brooke L, Chen Y Ann

机构信息

Health Informatics Institute, University of South Florida, Tampa, FL, USA.

Department of Biostatistics and Bioinformatics, Moffitt Cancer Center, Tampa, FL, USA.

出版信息

Bioinformatics. 2020 Jan 1;36(1):257-263. doi: 10.1093/bioinformatics/btz488.

DOI:10.1093/bioinformatics/btz488

PMID:31199438

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6956786/

Abstract

MOTIVATION

Missingness in label-free mass spectrometry is inherent to the technology. A computational approach to recover missing values in metabolomics and proteomics datasets is important. Most existing methods are designed under a particular assumption, either missing at random or under the detection limit. If the missing pattern deviates from the assumption, it may lead to biased results. Hence, we investigate the missing patterns in free mass spectrometry data and develop an omnibus approach GMSimpute, to allow effective imputation accommodating different missing patterns.

RESULTS

Three proteomics datasets and one metabolomics dataset indicate missing values could be a mixture of abundance-dependent and abundance-independent missingness. We assess the performance of GMSimpute using simulated data (with a wide range of 80 missing patterns) and metabolomics data from the Cancer Genome Atlas breast cancer and clear cell renal cell carcinoma studies. Using Pearson correlation and normalized root mean square errors between the true and imputed abundance, we compare its performance to K-nearest neighbors' type approaches, Random Forest, GSimp, a model-based method implemented in DanteR and minimum values. The results indicate GMSimpute provides higher accuracy in imputation and exhibits stable performance across different missing patterns. In addition, GMSimpute is able to identify the features in downstream differential expression analysis with high accuracy when applied to the Cancer Genome Atlas datasets.

AVAILABILITY AND IMPLEMENTATION

GMSimpute is on CRAN: https://cran.r-project.org/web/packages/GMSimpute/index.html.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

无标记质谱分析中的数据缺失是该技术所固有的。采用计算方法来恢复代谢组学和蛋白质组学数据集中的缺失值非常重要。现有的大多数方法都是在特定假设下设计的，要么是随机缺失，要么是低于检测限。如果缺失模式偏离该假设，可能会导致有偏差的结果。因此，我们研究了无标记质谱数据中的缺失模式，并开发了一种综合方法GMSimpute，以实现能适应不同缺失模式的有效插补。

结果

三个蛋白质组学数据集和一个代谢组学数据集表明，缺失值可能是丰度依赖性和丰度独立性缺失的混合。我们使用模拟数据（具有80种广泛的缺失模式）以及来自癌症基因组图谱乳腺癌和肾透明细胞癌研究的代谢组学数据，评估了GMSimpute的性能。通过真实丰度与插补丰度之间的Pearson相关性和归一化均方根误差，我们将其性能与K近邻类型方法、随机森林、GSimp、DanteR中实现的基于模型的方法以及最小值进行了比较。结果表明，GMSimpute在插补方面提供了更高的准确性，并且在不同的缺失模式下表现出稳定的性能。此外，当应用于癌症基因组图谱数据集时，GMSimpute能够在下游差异表达分析中高精度地识别特征。

可用性和实现

GMSimpute可在CRAN上获取：https://cran.r-project.org/web/packages/GMSimpute/index.html。

补充信息

补充数据可在《生物信息学》在线获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f371/6956786/8391871e3cab/btz488f1.jpg

相似文献

GMSimpute: a generalized two-step Lasso approach to impute missing values in label-free mass spectrum analysis.

Bioinformatics. 2020 Jan 1;36(1):257-263. doi: 10.1093/bioinformatics/btz488.

GSimp: A Gibbs sampler based left-censored missing value imputation approach for metabolomics studies.

PLoS Comput Biol. 2018 Jan 31;14(1):e1005973. doi: 10.1371/journal.pcbi.1005973. eCollection 2018 Jan.

Missing Value Imputation Approach for Mass Spectrometry-based Metabolomics Data.

Sci Rep. 2018 Jan 12;8(1):663. doi: 10.1038/s41598-017-19120-0.

Mechanism-aware imputation: a two-step approach in handling missing values in metabolomics.

BMC Bioinformatics. 2022 May 16;23(1):179. doi: 10.1186/s12859-022-04659-1.

NS-kNN: a modified k-nearest neighbors approach for imputing metabolomics data.

Metabolomics. 2018 Nov 23;14(12):153. doi: 10.1007/s11306-018-1451-8.

rMisbeta: A robust missing value imputation approach in transcriptomics and metabolomics data.

Comput Biol Med. 2021 Nov;138:104911. doi: 10.1016/j.compbiomed.2021.104911. Epub 2021 Sep 29.

Random forest-based imputation outperforms other methods for imputing LC-MS metabolomics data: a comparative study.

BMC Bioinformatics. 2019 Oct 11;20(1):492. doi: 10.1186/s12859-019-3110-0.

NMF-Based Approach for Missing Values Imputation of Mass Spectrometry Metabolomics Data.

Molecules. 2021 Sep 24;26(19):5787. doi: 10.3390/molecules26195787.

Distribution based nearest neighbor imputation for truncated high dimensional data with applications to pre-clinical and clinical metabolomics studies.

BMC Bioinformatics. 2017 Feb 20;18(1):114. doi: 10.1186/s12859-017-1547-6.

A Workflow for Missing Values Imputation of Untargeted Metabolomics Data.

Metabolites. 2020 Nov 26;10(12):486. doi: 10.3390/metabo10120486.

引用本文的文献

Missing Values in Longitudinal Proteome Dynamics Studies: Making a Case for Data Multiple Imputation.

J Proteome Res. 2024 Sep 6;23(9):4151-4162. doi: 10.1021/acs.jproteome.4c00263. Epub 2024 Aug 27.

Optimizing differential expression analysis for proteomics data via high-performing rules and ensemble inference.

Nat Commun. 2024 May 9;15(1):3922. doi: 10.1038/s41467-024-47899-w.

Inceptor facilitates acrosomal vesicle formation in spermatids and is required for male fertility.

Front Cell Dev Biol. 2023 Aug 24;11:1240039. doi: 10.3389/fcell.2023.1240039. eCollection 2023.

Statistics and Machine Learning in Mass Spectrometry-Based Metabolomics Analysis.

Methods Mol Biol. 2023;2629:247-269. doi: 10.1007/978-1-0716-2986-4_12.

Multiple Imputation Approaches Applied to the Missing Value Problem in Bottom-Up Proteomics.

Int J Mol Sci. 2021 Sep 6;22(17):9650. doi: 10.3390/ijms22179650.

OptiMissP: A dashboard to assess missingness in proteomic data-independent acquisition mass spectrometry.

PLoS One. 2021 Apr 15;16(4):e0249771. doi: 10.1371/journal.pone.0249771. eCollection 2021.

Plasma Metabolome and Circulating Vitamins Stratified Onset Age of an Initial Islet Autoantibody and Progression to Type 1 Diabetes: The TEDDY Study.

Diabetes. 2021 Jan;70(1):282-292. doi: 10.2337/db20-0696. Epub 2020 Oct 26.

Managing a Large-Scale Multiomics Project: A Team Science Case Study in Proteogenomics.

Methods Mol Biol. 2021;2194:187-221. doi: 10.1007/978-1-0716-0849-4_11.

NAguideR: performing and prioritizing missing value imputations for consistent bottom-up proteomic analyses.

Nucleic Acids Res. 2020 Aug 20;48(14):e83. doi: 10.1093/nar/gkaa498.

Longitudinal Metabolome-Wide Signals Prior to the Appearance of a First Islet Autoantibody in Children Participating in the TEDDY Study.

Diabetes. 2020 Mar;69(3):465-476. doi: 10.2337/db19-0756. Epub 2020 Feb 6.

本文引用的文献

GSimp: A Gibbs sampler based left-censored missing value imputation approach for metabolomics studies.

PLoS Comput Biol. 2018 Jan 31;14(1):e1005973. doi: 10.1371/journal.pcbi.1005973. eCollection 2018 Jan.

Missing Value Imputation Approach for Mass Spectrometry-based Metabolomics Data.

Sci Rep. 2018 Jan 12;8(1):663. doi: 10.1038/s41598-017-19120-0.

Missing value imputation for LC-MS metabolomics data by incorporating metabolic network and adduct ion relations.

Bioinformatics. 2018 May 1;34(9):1555-1561. doi: 10.1093/bioinformatics/btx816.

Detailed Investigation and Comparison of the XCMS and MZmine 2 Chromatogram Construction and Chromatographic Peak Detection Methods for Preprocessing Mass Spectrometry Metabolomics Data.

Anal Chem. 2017 Sep 5;89(17):8689-8695. doi: 10.1021/acs.analchem.7b01069. Epub 2017 Aug 17.

Metabolomics-Proteomics Combined Approach Identifies Differential Metabolism-Associated Molecular Events between Senescence and Apoptosis.

J Proteome Res. 2017 Jun 2;16(6):2250-2261. doi: 10.1021/acs.jproteome.7b00111. Epub 2017 May 10.

Distribution based nearest neighbor imputation for truncated high dimensional data with applications to pre-clinical and clinical metabolomics studies.

BMC Bioinformatics. 2017 Feb 20;18(1):114. doi: 10.1186/s12859-017-1547-6.

The MaxQuant computational platform for mass spectrometry-based shotgun proteomics.

Nat Protoc. 2016 Dec;11(12):2301-2319. doi: 10.1038/nprot.2016.136. Epub 2016 Oct 27.

An Integrated Metabolic Atlas of Clear Cell Renal Cell Carcinoma.

Cancer Cell. 2016 Jan 11;29(1):104-116. doi: 10.1016/j.ccell.2015.12.004.

Missing value imputation strategies for metabolomics data.

Electrophoresis. 2015 Dec;36(24):3050-60. doi: 10.1002/elps.201500352. Epub 2015 Oct 20.

4-protein signature predicting tamoxifen treatment outcome in recurrent breast cancer.

Mol Oncol. 2016 Jan;10(1):24-39. doi: 10.1016/j.molonc.2015.07.004. Epub 2015 Aug 7.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

Suppr超能文献

GMSimpute：一种用于在无标记质谱分析中插补缺失值的广义两步套索方法。

GMSimpute: a generalized two-step Lasso approach to impute missing values in label-free mass spectrum analysis.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

SUPPLEMENTARY INFORMATION

动机

结果

可用性和实现

补充信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译