Suppr超能文献

存在异常值和缺失值时的代谢组学生物标志物识别

Metabolomic Biomarker Identification in Presence of Outliers and Missing Values.

作者信息

Kumar Nishith, Hoque Md Aminul, Shahjaman Md, Islam S M Shahinul, Mollah Md Nurul Haque

机构信息

Bioinformatics Lab, Department of Statistics, Rajshahi University, Rajshahi, Bangladesh; Department of Statistics, Bangabandhu Sheikh Mujibur Rahman Science and Technology University, Gopalganj, Bangladesh.

Bioinformatics Lab, Department of Statistics, Rajshahi University, Rajshahi, Bangladesh.

出版信息

Biomed Res Int. 2017;2017:2437608. doi: 10.1155/2017/2437608. Epub 2017 Feb 14.

Abstract

Metabolomics is the sophisticated and high-throughput technology based on the entire set of metabolites which is known as the connector between genotypes and phenotypes. For any phenotypic changes, potential metabolite (biomarker) identification is very important because it provides diagnostic as well as prognostic markers and can help to develop new biomolecular therapy. Biomarker identification from metabolomics data analysis is hampered by the use of high-throughput technology that provides high dimensional data matrix which contains missing values as well as outliers. However, missing value imputation and outliers handling techniques play important role in identifying biomarker correctly. Although several missing value imputation techniques are available, outliers deteriorate the accuracy of imputation as well as the accuracy of biomarker identification. Therefore, in this paper we have proposed a new biomarker identification technique combining the groupwise robust singular value decomposition, -test, and fold-change approach that can identify biomarkers more correctly from metabolomics dataset. We have also compared the performance of the proposed technique with those of other traditional techniques for biomarker identification using both simulated and real data analysis in absence and presence of outliers. Using our proposed method in hepatocellular carcinoma (HCC) dataset, we have also identified the four upregulated and two downregulated metabolites as potential metabolomic biomarkers for HCC disease.

摘要

代谢组学是一种基于代谢物全集的复杂且高通量的技术,代谢物全集被认为是基因型和表型之间的连接体。对于任何表型变化而言,潜在代谢物(生物标志物)的识别都非常重要,因为它能提供诊断和预后标志物,并有助于开发新的生物分子疗法。从代谢组学数据分析中识别生物标志物受到高通量技术的阻碍,该技术提供的高维数据矩阵包含缺失值和异常值。然而,缺失值插补和异常值处理技术在正确识别生物标志物方面起着重要作用。尽管有几种缺失值插补技术可用,但异常值会降低插补的准确性以及生物标志物识别的准确性。因此,在本文中,我们提出了一种新的生物标志物识别技术,该技术结合了分组稳健奇异值分解、t检验和倍数变化方法,能够从代谢组学数据集中更准确地识别生物标志物。我们还在有无异常值的情况下,通过模拟和实际数据分析,将所提出技术的性能与其他传统生物标志物识别技术的性能进行了比较。在肝细胞癌(HCC)数据集中使用我们提出的方法,我们还识别出四种上调和两种下调的代谢物作为HCC疾病潜在的代谢组学生物标志物。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4272/5331169/a547591e55a4/BMRI2017-2437608.001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验