Monnerie Stephanie, Petera Melanie, Lyan Bernard, Gaudreau Pierrette, Comte Blandine, Pujos-Guillot Estelle
Université Clermont Auvergne, INRA, UNH, Mapping, F-63000 Clermont Ferrand, France.
Université Clermont Auvergne, INRA, UNH, Plateforme d'Exploration du Métabolisme, MetaboHUB Clermont, F-63000 Clermont-Ferrand, France.
Metabolites. 2019 Oct 24;9(11):250. doi: 10.3390/metabo9110250.
Metabolomics generates massive and complex data. Redundant different analytical species and the high degree of correlation in datasets is a constraint for the use of data mining/statistical methods and interpretation. In this context, we developed a new tool to detect analytical correlation into datasets without confounding them with biological correlations. Based on several parameters, such as a similarity measure, retention time, and mass information from known isotopes, adducts, or fragments, the algorithm principle is used to group features coming from the same analyte, and to propose one single representative per group. To illustrate the functionalities and added-value of this tool, it was applied to published datasets and compared to one of the most commonly used free packages proposing a grouping method for metabolomics data: 'CAMERA'. This tool was developed to be included in Galaxy and will be available in Workflow4Metabolomics (http://workflow4metabolomics.org). Source code is freely available for download under CeCILL 2.1 license at https://services.pfem.clermont.inra.fr/gitlab/grandpa /tool-acf and implement in Perl.
代谢组学产生海量且复杂的数据。数据集中冗余的不同分析物种类以及高度相关性,是数据挖掘/统计方法及解读应用的一个限制因素。在此背景下,我们开发了一种新工具,用于检测数据集中的分析相关性,而不会将其与生物学相关性相混淆。基于诸如相似性度量、保留时间以及来自已知同位素、加合物或碎片的质量信息等多个参数,该算法原理用于对来自同一分析物的特征进行分组,并为每组提出一个单一代表。为了说明该工具的功能和附加值,将其应用于已发表的数据集,并与最常用的免费软件包之一“CAMERA”进行比较,“CAMERA”为代谢组学数据提出了一种分组方法。该工具开发后将被纳入Galaxy,并将在Workflow4Metabolomics(http://workflow4metabolomics.org)中可用。源代码可根据CeCILL 2.1许可在https://services.pfem.clermont.inra.fr/gitlab/grandpa /tool-acf免费下载,并以Perl语言实现。