Suppr超能文献

代谢组学数据富集分析的生物信息学工具评估与比较。

Evaluation and comparison of bioinformatic tools for the enrichment analysis of metabolomics data.

机构信息

Biomarkers & Nutrimetabolomics Laboratory, Nutrition, Food Science and Gastronomy Department, Food Technology Reference Net (XaRTA), Nutrition and Food Safety Research Institute (INSA-UB), Faculty of Pharmacy and Food Sciences, Pharmacy and Food Science Faculty, University of Barcelona, Barcelona, Spain.

CIBER Fragilidad y Envejecimiento Saludable [CIBERfes], Instituto de Salud Carlos III [ISCIII], Madrid, Spain.

出版信息

BMC Bioinformatics. 2018 Jan 2;19(1):1. doi: 10.1186/s12859-017-2006-0.

Abstract

BACKGROUND

Bioinformatic tools for the enrichment of 'omics' datasets facilitate interpretation and understanding of data. To date few are suitable for metabolomics datasets. The main objective of this work is to give a critical overview, for the first time, of the performance of these tools. To that aim, datasets from metabolomic repositories were selected and enriched data were created. Both types of data were analysed with these tools and outputs were thoroughly examined.

RESULTS

An exploratory multivariate analysis of the most used tools for the enrichment of metabolite sets, based on a non-metric multidimensional scaling (NMDS) of Jaccard's distances, was performed and mirrored their diversity. Codes (identifiers) of the metabolites of the datasets were searched in different metabolite databases (HMDB, KEGG, PubChem, ChEBI, BioCyc/HumanCyc, LipidMAPS, ChemSpider, METLIN and Recon2). The databases that presented more identifiers of the metabolites of the dataset were PubChem, followed by METLIN and ChEBI. However, these databases had duplicated entries and might present false positives. The performance of over-representation analysis (ORA) tools, including BioCyc/HumanCyc, ConsensusPathDB, IMPaLA, MBRole, MetaboAnalyst, Metabox, MetExplore, MPEA, PathVisio and Reactome and the mapping tool KEGGREST, was examined. Results were mostly consistent among tools and between real and enriched data despite the variability of the tools. Nevertheless, a few controversial results such as differences in the total number of metabolites were also found. Disease-based enrichment analyses were also assessed, but they were not found to be accurate probably due to the fact that metabolite disease sets are not up-to-date and the difficulty of predicting diseases from a list of metabolites.

CONCLUSIONS

We have extensively reviewed the state-of-the-art of the available range of tools for metabolomic datasets, the completeness of metabolite databases, the performance of ORA methods and disease-based analyses. Despite the variability of the tools, they provided consistent results independent of their analytic approach. However, more work on the completeness of metabolite and pathway databases is required, which strongly affects the accuracy of enrichment analyses. Improvements will be translated into more accurate and global insights of the metabolome.

摘要

背景

生物信息学工具可用于丰富“组学”数据集,从而促进对数据的解释和理解。迄今为止,适合代谢组学数据集的工具寥寥无几。这项工作的主要目的是首次对这些工具的性能进行批判性综述。为此,从代谢组学存储库中选择了数据集,并创建了富集数据。使用这些工具分析了这两种类型的数据,并对输出结果进行了深入检查。

结果

基于 Jaccard 距离的非度量多维标度 (NMDS) 对最常用于富集代谢物集的工具进行了探索性多变量分析,反映了它们的多样性。在不同的代谢物数据库(HMDB、KEGG、PubChem、ChEBI、BioCyc/HumanCyc、LipidMAPS、ChemSpider、METLIN 和 Recon2)中搜索数据集的代谢物代码(标识符)。PubChem 提供了数据集代谢物的标识符最多,其次是 METLIN 和 ChEBI。然而,这些数据库存在重复条目,可能存在假阳性。还检查了过表达分析 (ORA) 工具(包括 BioCyc/HumanCyc、ConsensusPathDB、IMPaLA、MBRole、MetaboAnalyst、Metabox、MetExplore、MPEA、PathVisio 和 Reactome)和映射工具 KEGGREST 的性能。尽管工具存在差异,但结果在工具之间和真实数据与富集数据之间大多一致。还评估了基于疾病的富集分析,但由于代谢物疾病集不是最新的,以及从代谢物列表预测疾病的难度,它们不太准确。

结论

我们广泛回顾了可用于代谢组学数据集的一系列工具的最新状态、代谢物数据库的完整性、ORA 方法和基于疾病的分析的性能。尽管工具存在差异,但它们提供了独立于分析方法的一致结果。然而,需要对代谢物和途径数据库的完整性进行更多的工作,这会强烈影响富集分析的准确性。改进将转化为对代谢组的更准确和全面的认识。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/20e6/5749025/8de7c2224bd4/12859_2017_2006_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验