Biodata Mining Group, Faculty of Technology, Bielefeld University , Bielefeld , Germany ; Bioinformatics Resource Facility, Center for Biotechnology, Bielefeld University , Bielefeld , Germany.
Department of Safety and Quality of Cereals, Max Rubner-Institut , Detmold , Germany.
Front Bioeng Biotechnol. 2015 Mar 24;3:35. doi: 10.3389/fbioe.2015.00035. eCollection 2015.
We present results of our machine learning approach to the problem of classifying GC-MS data originating from wheat grains of different farming systems. The aim is to investigate the potential of learning algorithms to classify GC-MS data to be either from conventionally grown or from organically grown samples and considering different cultivars. The motivation of our work is rather obvious nowadays: increased demand for organic food in post-industrialized societies and the necessity to prove organic food authenticity. The background of our data set is given by up to 11 wheat cultivars that have been cultivated in both farming systems, organic and conventional, throughout 3 years. More than 300 GC-MS measurements were recorded and subsequently processed and analyzed in the MeltDB 2.0 metabolomics analysis platform, being briefly outlined in this paper. We further describe how unsupervised (t-SNE, PCA) and supervised (SVM) methods can be applied for sample visualization and classification. Our results clearly show that years have most and wheat cultivars have second-most influence on the metabolic composition of a sample. We can also show that for a given year and cultivar, organic and conventional cultivation can be distinguished by machine-learning algorithms.
我们提出了一种机器学习方法的结果,用于对源自不同农业系统的小麦谷物的 GC-MS 数据进行分类。目的是研究学习算法将 GC-MS 数据分类为来自传统种植或有机种植样本并考虑不同品种的潜力。我们工作的动机在当今社会非常明显:后工业化社会对有机食品的需求增加,以及证明有机食品真实性的必要性。我们数据集的背景是在 3 年内,多达 11 个小麦品种在有机和传统两种农业系统中进行了种植。记录了 300 多次 GC-MS 测量值,并随后在 MeltDB 2.0 代谢组学分析平台中进行了处理和分析,本文对此进行了简要概述。我们进一步描述了如何应用无监督(t-SNE、PCA)和监督(SVM)方法进行样本可视化和分类。我们的结果清楚地表明,年份对样本的代谢组成影响最大,而小麦品种的影响次之。我们还可以表明,对于给定的年份和品种,有机和传统的种植可以通过机器学习算法来区分。