Suppr超能文献

数据预处理和插补方法的综合变化揭示了源自患者肺癌的代谢组学生物标志物鉴定中的差异。

Discrepancies in metabolomic biomarker identification from patient-derived lung cancer revealed by combined variation in data pre-treatment and imputation methods.

机构信息

Department of Pharmacology and Toxicology, University of Louisville, Louisville, KY, USA.

Department of Bioengineering, University of Louisville, Louisville, KY, USA.

出版信息

Metabolomics. 2021 Mar 27;17(4):37. doi: 10.1007/s11306-021-01787-2.

Abstract

INTRODUCTION

The identification of metabolomic biomarkers predictive of cancer patient response to therapy and of disease stage has been pursued as a "holy grail" of modern oncology, relying on the metabolic dysfunction that characterizes cancer progression. In spite of the evaluation of many candidate biomarkers, however, determination of a consistent set with practical clinical utility has proven elusive.

OBJECTIVE

In this study, we systematically examine the combined role of data pre-treatment and imputation methods on the performance of multivariate data analysis methods and their identification of potential biomarkers.

METHODS

Uniquely, we are able to systematically evaluate both unsupervised and supervised methods with a metabolomic data set obtained from patient-derived lung cancer core biopsies with true missing values. Eight pre-treatment methods, ten imputation methods, and two data analysis methods were applied in combination.

RESULTS

The combined choice of pre-treatment and imputation methods is critical in the definition of candidate biomarkers, with deficient or inappropriate selection of these methods leading to inconsistent results, and with important biomarkers either being overlooked or reported as a false positive. The log transformation appeared to normalize the original tumor data most effectively, but the performance of the imputation applied after the transformation was highly dependent on the characteristics of the data set.

CONCLUSION

The combined choice of pre-treatment and imputation methods may need careful evaluation prior to metabolomic data analysis of human tumors, in order to enable consistent identification of potential biomarkers predictive of response to therapy and of disease stage.

摘要

简介

识别能够预测癌症患者对治疗反应和疾病阶段的代谢组学生物标志物,一直是现代肿瘤学的“圣杯”,这依赖于能够表征癌症进展的代谢功能障碍。然而,尽管评估了许多候选生物标志物,但确定一套具有实际临床应用价值的生物标志物仍然难以实现。

目的

在这项研究中,我们系统地研究了数据预处理和插补方法对多元数据分析方法性能及其潜在生物标志物识别的综合作用。

方法

我们能够系统地评估来自患者衍生的肺癌核心活检的代谢组学数据集的无监督和有监督方法,该数据集具有真实的缺失值。应用了八种预处理方法、十种插补方法和两种数据分析方法。

结果

预处理和插补方法的综合选择对于候选生物标志物的定义至关重要,这些方法的选择不足或不当会导致结果不一致,并且重要的生物标志物要么被忽略,要么被错误地报告为假阳性。对数变换似乎最有效地对原始肿瘤数据进行了归一化,但应用于变换后的插补方法的性能高度依赖于数据集的特征。

结论

在对人类肿瘤的代谢组学数据进行分析之前,可能需要仔细评估预处理和插补方法的综合选择,以能够一致地识别预测治疗反应和疾病阶段的潜在生物标志物。

相似文献

5
Evaluating Machine Learning Methods of Analyzing Multiclass Metabolomics.评估分析多类代谢组学的机器学习方法。
J Chem Inf Model. 2023 Dec 25;63(24):7628-7641. doi: 10.1021/acs.jcim.3c01525. Epub 2023 Dec 11.

本文引用的文献

9
Random Forest Missing Data Algorithms.随机森林缺失数据算法
Stat Anal Data Min. 2017 Dec;10(6):363-377. doi: 10.1002/sam.11348. Epub 2017 Jun 13.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验