数据预处理和插补方法的综合变化揭示了源自患者肺癌的代谢组学生物标志物鉴定中的差异。

Discrepancies in metabolomic biomarker identification from patient-derived lung cancer revealed by combined variation in data pre-treatment and imputation methods.

机构信息

Department of Pharmacology and Toxicology, University of Louisville, Louisville, KY, USA.

Department of Bioengineering, University of Louisville, Louisville, KY, USA.

出版信息

Metabolomics. 2021 Mar 27;17(4):37. doi: 10.1007/s11306-021-01787-2.

DOI:10.1007/s11306-021-01787-2

PMID:33772663

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8138701/

Abstract

INTRODUCTION

The identification of metabolomic biomarkers predictive of cancer patient response to therapy and of disease stage has been pursued as a "holy grail" of modern oncology, relying on the metabolic dysfunction that characterizes cancer progression. In spite of the evaluation of many candidate biomarkers, however, determination of a consistent set with practical clinical utility has proven elusive.

OBJECTIVE

In this study, we systematically examine the combined role of data pre-treatment and imputation methods on the performance of multivariate data analysis methods and their identification of potential biomarkers.

METHODS

Uniquely, we are able to systematically evaluate both unsupervised and supervised methods with a metabolomic data set obtained from patient-derived lung cancer core biopsies with true missing values. Eight pre-treatment methods, ten imputation methods, and two data analysis methods were applied in combination.

RESULTS

The combined choice of pre-treatment and imputation methods is critical in the definition of candidate biomarkers, with deficient or inappropriate selection of these methods leading to inconsistent results, and with important biomarkers either being overlooked or reported as a false positive. The log transformation appeared to normalize the original tumor data most effectively, but the performance of the imputation applied after the transformation was highly dependent on the characteristics of the data set.

CONCLUSION

The combined choice of pre-treatment and imputation methods may need careful evaluation prior to metabolomic data analysis of human tumors, in order to enable consistent identification of potential biomarkers predictive of response to therapy and of disease stage.

摘要

简介

识别能够预测癌症患者对治疗反应和疾病阶段的代谢组学生物标志物，一直是现代肿瘤学的“圣杯”，这依赖于能够表征癌症进展的代谢功能障碍。然而，尽管评估了许多候选生物标志物，但确定一套具有实际临床应用价值的生物标志物仍然难以实现。

目的

在这项研究中，我们系统地研究了数据预处理和插补方法对多元数据分析方法性能及其潜在生物标志物识别的综合作用。

方法

我们能够系统地评估来自患者衍生的肺癌核心活检的代谢组学数据集的无监督和有监督方法，该数据集具有真实的缺失值。应用了八种预处理方法、十种插补方法和两种数据分析方法。

结果

预处理和插补方法的综合选择对于候选生物标志物的定义至关重要，这些方法的选择不足或不当会导致结果不一致，并且重要的生物标志物要么被忽略，要么被错误地报告为假阳性。对数变换似乎最有效地对原始肿瘤数据进行了归一化，但应用于变换后的插补方法的性能高度依赖于数据集的特征。

结论

在对人类肿瘤的代谢组学数据进行分析之前，可能需要仔细评估预处理和插补方法的综合选择，以能够一致地识别预测治疗反应和疾病阶段的潜在生物标志物。

相似文献

Discrepancies in metabolomic biomarker identification from patient-derived lung cancer revealed by combined variation in data pre-treatment and imputation methods.数据预处理和插补方法的综合变化揭示了源自患者肺癌的代谢组学生物标志物鉴定中的差异。

Metabolomics. 2021 Mar 27;17(4):37. doi: 10.1007/s11306-021-01787-2.

Lung cancer survival prediction and biomarker identification with an ensemble machine learning analysis of tumor core biopsy metabolomic data.基于肿瘤核心活检代谢组学数据的集成机器学习分析进行肺癌生存预测和生物标志物鉴定。

Metabolomics. 2022 Jul 20;18(8):57. doi: 10.1007/s11306-022-01918-3.

Two data pre-processing workflows to facilitate the discovery of biomarkers by 2D NMR metabolomics.两种数据预处理工作流程，可通过 2D-NMR 代谢组学发现生物标志物。

Metabolomics. 2019 Apr 16;15(4):63. doi: 10.1007/s11306-019-1524-3.

Translational Metabolomics of Head Injury: Exploring Dysfunctional Cerebral Metabolism with Ex Vivo NMR Spectroscopy-Based Metabolite Quantification头部损伤的转化代谢组学：基于体外核磁共振波谱的代谢物定量分析探索脑代谢功能障碍

Evaluating Machine Learning Methods of Analyzing Multiclass Metabolomics.评估分析多类代谢组学的机器学习方法。

J Chem Inf Model. 2023 Dec 25;63(24):7628-7641. doi: 10.1021/acs.jcim.3c01525. Epub 2023 Dec 11.

Metabolic profiling of potential lung cancer biomarkers using bronchoalveolar lavage fluid and the integrated direct infusion/ gas chromatography mass spectrometry platform.使用支气管肺泡灌洗液和集成直接进样/气相色谱质谱平台对潜在肺癌生物标志物进行代谢谱分析。

J Proteomics. 2016 Aug 11;145:197-206. doi: 10.1016/j.jprot.2016.05.030. Epub 2016 May 30.

Metabolomic profiling of human lung tumor tissues - nucleotide metabolism as a candidate for therapeutic interventions and biomarkers.人类肺部肿瘤组织的代谢组学分析——核苷酸代谢作为治疗干预和生物标志物的候选物。

Mol Oncol. 2018 Oct;12(10):1778-1796. doi: 10.1002/1878-0261.12369. Epub 2018 Sep 13.

Metabolomic Biomarker Identification in Presence of Outliers and Missing Values.存在异常值和缺失值时的代谢组学生物标志物识别

Biomed Res Int. 2017;2017:2437608. doi: 10.1155/2017/2437608. Epub 2017 Feb 14.

GSimp: A Gibbs sampler based left-censored missing value imputation approach for metabolomics studies.GSimp：一种基于 Gibbs 抽样的代谢组学研究中左截断缺失值插补方法。

PLoS Comput Biol. 2018 Jan 31;14(1):e1005973. doi: 10.1371/journal.pcbi.1005973. eCollection 2018 Jan.

Comparative analysis of targeted metabolomics: dominance-based rough set approach versus orthogonal partial least square-discriminant analysis.靶向代谢组学的比较分析：基于优势的粗糙集方法与正交偏最小二乘判别分析

J Biomed Inform. 2015 Feb;53:291-9. doi: 10.1016/j.jbi.2014.12.001. Epub 2014 Dec 11.

引用本文的文献

Evaluation of Lung Cancer Patient Response to First-Line Chemotherapy by Integration of Tumor Core Biopsy Metabolomics with Multiscale Modeling.基于肿瘤核心活检代谢组学与多尺度建模整合评估一线化疗的肺癌患者的反应。

Ann Biomed Eng. 2023 Apr;51(4):820-832. doi: 10.1007/s10439-022-03096-8. Epub 2022 Oct 12.

Evaluation of disease staging and chemotherapeutic response in non-small cell lung cancer from patient tumor-derived metabolomic data.基于患者肿瘤衍生代谢组学数据评估非小细胞肺癌的疾病分期和化疗反应。

Lung Cancer. 2021 Jun;156:20-30. doi: 10.1016/j.lungcan.2021.04.012. Epub 2021 Apr 15.

本文引用的文献

Lung Cancer. 2021 Jun;156:20-30. doi: 10.1016/j.lungcan.2021.04.012. Epub 2021 Apr 15.

Migrating from partial least squares discriminant analysis to artificial neural networks: a comparison of functionally equivalent visualisation and feature contribution tools using jupyter notebooks.从偏最小二乘判别分析到人工神经网络的迁移：使用 Jupyter 笔记本比较功能等效的可视化和特征贡献工具。

Metabolomics. 2020 Jan 21;16(2):17. doi: 10.1007/s11306-020-1640-0.

Random forest-based imputation outperforms other methods for imputing LC-MS metabolomics data: a comparative study.基于随机森林的插补方法在 LC-MS 代谢组学数据插补方面优于其他方法：一项比较研究。

BMC Bioinformatics. 2019 Oct 11;20(1):492. doi: 10.1186/s12859-019-3110-0.

Critical review of reporting of the data analysis step in metabolomics.代谢组学数据分析步骤报告的批判性回顾。

Metabolomics. 2017 Dec 1;14(1):7. doi: 10.1007/s11306-017-1299-3.

A decade after the metabolomics standards initiative it's time for a revision.代谢组学标准倡议提出十年后，是时候进行修订了。

Sci Data. 2017 Sep 26;4:170138. doi: 10.1038/sdata.2017.138.

A review of metabolism-associated biomarkers in lung cancer diagnosis and treatment.肺癌诊断与治疗中代谢相关生物标志物的综述

Metabolomics. 2018 Jun;14(6):81. doi: 10.1007/s11306-018-1376-2. Epub 2018 Jun 1.

Overoptimism in cross-validation when using partial least squares-discriminant analysis for omics data: a systematic study.使用偏最小二乘判别分析进行组学数据分析时，交叉验证中的过度乐观：一项系统研究。

Anal Bioanal Chem. 2018 Sep;410(23):5981-5992. doi: 10.1007/s00216-018-1217-1. Epub 2018 Jun 29.

MetaboAnalyst 4.0: towards more transparent and integrative metabolomics analysis.MetaboAnalyst 4.0：迈向更透明、更综合的代谢组学分析。

Nucleic Acids Res. 2018 Jul 2;46(W1):W486-W494. doi: 10.1093/nar/gky310.

Random Forest Missing Data Algorithms.随机森林缺失数据算法

Stat Anal Data Min. 2017 Dec;10(6):363-377. doi: 10.1002/sam.11348. Epub 2017 Jun 13.

Missing Value Imputation Approach for Mass Spectrometry-based Metabolomics Data.基于质谱的代谢组学数据的缺失值插补方法。

Sci Rep. 2018 Jan 12;8(1):663. doi: 10.1038/s41598-017-19120-0.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验