存在异常值和缺失值时的代谢组学生物标志物识别

Metabolomic Biomarker Identification in Presence of Outliers and Missing Values.

作者信息

Kumar Nishith, Hoque Md Aminul, Shahjaman Md, Islam S M Shahinul, Mollah Md Nurul Haque

机构信息

Bioinformatics Lab, Department of Statistics, Rajshahi University, Rajshahi, Bangladesh; Department of Statistics, Bangabandhu Sheikh Mujibur Rahman Science and Technology University, Gopalganj, Bangladesh.

Bioinformatics Lab, Department of Statistics, Rajshahi University, Rajshahi, Bangladesh.

出版信息

Biomed Res Int. 2017;2017:2437608. doi: 10.1155/2017/2437608. Epub 2017 Feb 14.

DOI:10.1155/2017/2437608

PMID:28293630

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5331169/

Abstract

Metabolomics is the sophisticated and high-throughput technology based on the entire set of metabolites which is known as the connector between genotypes and phenotypes. For any phenotypic changes, potential metabolite (biomarker) identification is very important because it provides diagnostic as well as prognostic markers and can help to develop new biomolecular therapy. Biomarker identification from metabolomics data analysis is hampered by the use of high-throughput technology that provides high dimensional data matrix which contains missing values as well as outliers. However, missing value imputation and outliers handling techniques play important role in identifying biomarker correctly. Although several missing value imputation techniques are available, outliers deteriorate the accuracy of imputation as well as the accuracy of biomarker identification. Therefore, in this paper we have proposed a new biomarker identification technique combining the groupwise robust singular value decomposition, -test, and fold-change approach that can identify biomarkers more correctly from metabolomics dataset. We have also compared the performance of the proposed technique with those of other traditional techniques for biomarker identification using both simulated and real data analysis in absence and presence of outliers. Using our proposed method in hepatocellular carcinoma (HCC) dataset, we have also identified the four upregulated and two downregulated metabolites as potential metabolomic biomarkers for HCC disease.

摘要

代谢组学是一种基于代谢物全集的复杂且高通量的技术，代谢物全集被认为是基因型和表型之间的连接体。对于任何表型变化而言，潜在代谢物（生物标志物）的识别都非常重要，因为它能提供诊断和预后标志物，并有助于开发新的生物分子疗法。从代谢组学数据分析中识别生物标志物受到高通量技术的阻碍，该技术提供的高维数据矩阵包含缺失值和异常值。然而，缺失值插补和异常值处理技术在正确识别生物标志物方面起着重要作用。尽管有几种缺失值插补技术可用，但异常值会降低插补的准确性以及生物标志物识别的准确性。因此，在本文中，我们提出了一种新的生物标志物识别技术，该技术结合了分组稳健奇异值分解、t检验和倍数变化方法，能够从代谢组学数据集中更准确地识别生物标志物。我们还在有无异常值的情况下，通过模拟和实际数据分析，将所提出技术的性能与其他传统生物标志物识别技术的性能进行了比较。在肝细胞癌（HCC）数据集中使用我们提出的方法，我们还识别出四种上调和两种下调的代谢物作为HCC疾病潜在的代谢组学生物标志物。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4272/5331169/a547591e55a4/BMRI2017-2437608.001.jpg

相似文献

Metabolomic Biomarker Identification in Presence of Outliers and Missing Values.存在异常值和缺失值时的代谢组学生物标志物识别

Biomed Res Int. 2017;2017:2437608. doi: 10.1155/2017/2437608. Epub 2017 Feb 14.

rMisbeta: A robust missing value imputation approach in transcriptomics and metabolomics data.rMisbeta：转录组学和代谢组学数据中稳健的缺失值插补方法。

Comput Biol Med. 2021 Nov;138:104911. doi: 10.1016/j.compbiomed.2021.104911. Epub 2021 Sep 29.

A novel analysis method for biomarker identification based on horizontal relationship: identifying potential biomarkers from large-scale hepatocellular carcinoma metabolomics data.一种基于水平关系的生物标志物识别的新分析方法：从大规模肝细胞癌代谢组学数据中识别潜在的生物标志物。

Anal Bioanal Chem. 2019 Sep;411(24):6377-6386. doi: 10.1007/s00216-019-02011-w. Epub 2019 Aug 5.

Kernel weighted least square approach for imputing missing values of metabolomics data.核加权最小二乘法在代谢组学数据缺失值插补中的应用。

Sci Rep. 2021 May 27;11(1):11108. doi: 10.1038/s41598-021-90654-0.

A Computational Selection of Metabolite Biomarkers Using Emerging Pattern Mining: A Case Study in Human Hepatocellular Carcinoma.使用新兴模式挖掘进行代谢物生物标志物的计算选择：以人类肝细胞癌为例

J Proteome Res. 2017 Jun 2;16(6):2240-2249. doi: 10.1021/acs.jproteome.7b00054. Epub 2017 May 3.

Metabolomic profiling of human urine in hepatocellular carcinoma patients using gas chromatography/mass spectrometry.利用气相色谱/质谱联用技术对肝细胞癌患者的人尿液进行代谢组学分析。

Anal Chim Acta. 2009 Aug 19;648(1):98-104. doi: 10.1016/j.aca.2009.06.033. Epub 2009 Jun 21.

Proteomic and metabonomic biomarkers for hepatocellular carcinoma: a comprehensive review.肝细胞癌的蛋白质组学和代谢组学生物标志物：综述

Br J Cancer. 2015 Mar 31;112(7):1141-56. doi: 10.1038/bjc.2015.38.

Development of urinary pseudotargeted LC-MS-based metabolomics method and its application in hepatocellular carcinoma biomarker discovery.基于液相色谱-质谱联用的尿液伪靶向代谢组学方法的开发及其在肝细胞癌生物标志物发现中的应用。

J Proteome Res. 2015 Feb 6;14(2):906-16. doi: 10.1021/pr500973d. Epub 2014 Dec 17.

Power of metabolomics in diagnosis and biomarker discovery of hepatocellular carcinoma.代谢组学在肝细胞癌诊断和生物标志物发现中的作用。

Hepatology. 2013 May;57(5):2072-7. doi: 10.1002/hep.26130. Epub 2013 Feb 15.

Phenotypic Characterization Analysis of Human Hepatocarcinoma by Urine Metabolomics Approach.基于尿液代谢组学方法的人肝癌表型特征分析

Sci Rep. 2016 Jan 25;6:19763. doi: 10.1038/srep19763.

引用本文的文献

Metabolomic Profiling Reveals Biomarkers in Coronary Heart Disease Comorbidity.代谢组学分析揭示冠心病合并症中的生物标志物。

J Diabetes Res. 2024 Dec 19;2024:8559677. doi: 10.1155/jdr/8559677. eCollection 2024.

Medical prediction from missing data with max-minus negative regularized dropout.基于最大负正则化随机失活的缺失数据医学预测

Front Neurosci. 2023 Jul 13;17:1221970. doi: 10.3389/fnins.2023.1221970. eCollection 2023.

How to deal with non-detectable and outlying values in biomarker research: Best practices and recommendations for univariate imputation approaches.生物标志物研究中如何处理未检测到的值和离群值：单变量插补方法的最佳实践与建议

Compr Psychoneuroendocrinol. 2021 Mar 29;7:100052. doi: 10.1016/j.cpnec.2021.100052. eCollection 2021 Aug.

Metabolomic profiling of matured coconut water during post-harvest storage revealed discrimination and distinct changes in metabolites.收获后储存期间成熟椰子水的代谢组学分析揭示了代谢物的差异和明显变化。

RSC Adv. 2018 Sep 6;8(55):31396-31405. doi: 10.1039/c8ra04213f. eCollection 2018 Sep 5.

Kernel weighted least square approach for imputing missing values of metabolomics data.核加权最小二乘法在代谢组学数据缺失值插补中的应用。

Sci Rep. 2021 May 27;11(1):11108. doi: 10.1038/s41598-021-90654-0.

Targeted metabolomics analysis of postoperative delirium.术后谵妄的靶向代谢组学分析。

Sci Rep. 2021 Jan 15;11(1):1521. doi: 10.1038/s41598-020-80412-z.

An Efficient and Effective Model to Handle Missing Data in Classification.一种用于分类中处理缺失数据的高效有效模型。

Biomed Res Int. 2020 Nov 25;2020:8810143. doi: 10.1155/2020/8810143. eCollection 2020.

Metabolomic Profiling Revealed Potential Biomarkers in Patients With Moyamoya Disease.代谢组学分析揭示了烟雾病患者潜在的生物标志物。

Front Neurosci. 2020 Apr 21;14:308. doi: 10.3389/fnins.2020.00308. eCollection 2020.

Predictive Modeling for Metabolomics Data.代谢组学数据分析中的预测建模。

Methods Mol Biol. 2020;2104:313-336. doi: 10.1007/978-1-0716-0239-3_16.

Robust volcano plot: identification of differential metabolites in the presence of outliers.稳健火山图：在存在离群值的情况下鉴定差异代谢物。

BMC Bioinformatics. 2018 Apr 11;19(1):128. doi: 10.1186/s12859-018-2117-2.

本文引用的文献

Data Fusion in Metabolomics and Proteomics for Biomarker Discovery.代谢组学和蛋白质组学中的数据融合用于生物标志物发现

Methods Mol Biol. 2016;1362:209-23. doi: 10.1007/978-1-4939-3106-4_14.

Robust prediction of anti-cancer drug sensitivity and sensitivity-specific biomarker.抗癌药物敏感性及敏感性特异性生物标志物的可靠预测

PLoS One. 2014 Oct 17;9(10):e108990. doi: 10.1371/journal.pone.0108990. eCollection 2014.

Influence of missing values substitutes on multivariate analysis of metabolomics data.缺失值替代方法对代谢组学数据多变量分析的影响。

Metabolites. 2014 Jun 16;4(2):433-52. doi: 10.3390/metabo4020433.

Volcano plots in analyzing differential expressions with mRNA microarrays.用于分析mRNA微阵列差异表达的火山图。

J Bioinform Comput Biol. 2012 Dec;10(6):1231003. doi: 10.1142/S0219720012310038. Epub 2012 Oct 15.

MissForest--non-parametric missing value imputation for mixed-type data.MissForest--用于混合类型数据的非参数缺失值插补。

Bioinformatics. 2012 Jan 1;28(1):112-8. doi: 10.1093/bioinformatics/btr597. Epub 2011 Oct 28.

Aberrant lipid metabolism in hepatocellular carcinoma revealed by plasma metabolomics and lipid profiling.血浆代谢组学和脂质谱分析揭示肝癌中的脂质代谢异常。

Cancer Res. 2011 Nov 1;71(21):6590-600. doi: 10.1158/0008-5472.CAN-11-0885. Epub 2011 Sep 7.

Reporting FDR analogous confidence intervals for the log fold change of differentially expressed genes.报告差异表达基因对数倍变化的 FDR 类似置信区间。

BMC Bioinformatics. 2011 Jul 15;12:288. doi: 10.1186/1471-2105-12-288.

A close examination of double filtering with fold change and T test in microarray analysis.微阵列分析中倍数变化和 T 检验的双重过滤的详细检查。

BMC Bioinformatics. 2009 Dec 8;10:402. doi: 10.1186/1471-2105-10-402.

Dealing with missing values in large-scale studies: microarray data imputation and beyond.处理大规模研究中的缺失值：微阵列数据插补及其他方法。

Brief Bioinform. 2010 Mar;11(2):253-64. doi: 10.1093/bib/bbp059. Epub 2009 Dec 4.

MetaboAnalyst: a web server for metabolomic data analysis and interpretation.MetaboAnalyst：一个用于代谢组学数据分析与解读的网络服务器。

Nucleic Acids Res. 2009 Jul;37(Web Server issue):W652-60. doi: 10.1093/nar/gkp356. Epub 2009 May 8.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

存在异常值和缺失值时的代谢组学生物标志物识别

Metabolomic Biomarker Identification in Presence of Outliers and Missing Values.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献