Costa Christopher, Maraschin Marcelo, Rocha Miguel
CEB - Centre Biological Engineering, University of Minho, Braga, Portugal.
Plant Morphogenesis and Biochemistry Laboratory, Federal University of Santa Catarina, Florianopolis, Brazil.
Comput Methods Programs Biomed. 2016 Jun;129:117-24. doi: 10.1016/j.cmpb.2016.01.008. Epub 2016 Jan 14.
Recently, there has been a growing interest in the field of metabolomics, materialized by a remarkable growth in experimental techniques, available data and related biological applications. Indeed, techniques as nuclear magnetic resonance, gas or liquid chromatography, mass spectrometry, infrared and UV-visible spectroscopies have provided extensive datasets that can help in tasks as biological and biomedical discovery, biotechnology and drug development. However, as it happens with other omics data, the analysis of metabolomics datasets provides multiple challenges, both in terms of methodologies and in the development of appropriate computational tools. Indeed, from the available software tools, none addresses the multiplicity of existing techniques and data analysis tasks. In this work, we make available a novel R package, named specmine, which provides a set of methods for metabolomics data analysis, including data loading in different formats, pre-processing, metabolite identification, univariate and multivariate data analysis, machine learning, and feature selection. Importantly, the implemented methods provide adequate support for the analysis of data from diverse experimental techniques, integrating a large set of functions from several R packages in a powerful, yet simple to use environment. The package, already available in CRAN, is accompanied by a web site where users can deposit datasets, scripts and analysis reports to be shared with the community, promoting the efficient sharing of metabolomics data analysis pipelines.
最近,代谢组学领域越来越受到关注,这体现在实验技术、可用数据及相关生物学应用的显著增长上。的确,诸如核磁共振、气相或液相色谱、质谱、红外和紫外可见光谱等技术已经提供了大量数据集,这些数据集有助于生物和生物医学发现、生物技术及药物开发等任务。然而,与其他组学数据一样,代谢组学数据集的分析在方法学以及开发合适的计算工具方面都带来了多重挑战。实际上,现有的软件工具中,没有一个能应对现有技术和数据分析任务的多样性。在这项工作中,我们推出了一个名为specmine的新型R包,它提供了一套代谢组学数据分析方法,包括不同格式的数据加载、预处理、代谢物鉴定、单变量和多变量数据分析、机器学习以及特征选择。重要的是,所实现的方法为来自不同实验技术的数据的分析提供了充分支持,在一个强大但易于使用的环境中整合了来自多个R包的大量功能。该包已在CRAN上可用,同时还附带一个网站,用户可以在该网站上存放数据集、脚本和分析报告以便与社区共享,从而促进代谢组学数据分析流程的高效共享。