RIKEN Center for Sustainable Resource Science, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 235-0045, Japan.
Graduate School of Medical Life Science, Yokohama City University, 1-7-29 Suehiro-cho, Tsurumi-ku, Yokohama, 230-0045, Japan.
Sci Rep. 2018 Feb 21;8(1):3426. doi: 10.1038/s41598-018-20121-w.
Computer-based technological innovation provides advancements in sophisticated and diverse analytical instruments, enabling massive amounts of data collection with relative ease. This is accompanied by a fast-growing demand for technological progress in data mining methods for analysis of big data derived from chemical and biological systems. From this perspective, use of a general "linear" multivariate analysis alone limits interpretations due to "non-linear" variations in metabolic data from living organisms. Here we describe a kernel principal component analysis (KPCA)-incorporated analytical approach for extracting useful information from metabolic profiling data. To overcome the limitation of important variable (metabolite) determinations, we incorporated a random forest conditional variable importance measure into our KPCA-based analytical approach to demonstrate the relative importance of metabolites. Using a market basket analysis, hippurate, the most important variable detected in the importance measure, was associated with high levels of some vitamins and minerals present in foods eaten the previous day, suggesting a relationship between increased hippurate and intake of a wide variety of vegetables and fruits. Therefore, the KPCA-incorporated analytical approach described herein enabled us to capture input-output responses, and should be useful not only for metabolic profiling but also for profiling in other areas of biological and environmental systems.
基于计算机的技术创新为复杂多样的分析仪器提供了进步,使得大量数据的收集变得相对容易。伴随着对数据分析方法的技术进步的需求也在快速增长,以便对来自化学和生物系统的大数据进行分析。从这个角度来看,仅使用一般的“线性”多元分析由于来自生物体的代谢数据的“非线性”变化而限制了解释。在这里,我们描述了一种基于核主成分分析(KPCA)的分析方法,用于从代谢轮廓数据中提取有用信息。为了克服重要变量(代谢物)测定的限制,我们将随机森林条件变量重要性度量纳入基于 KPCA 的分析方法中,以证明代谢物的相对重要性。使用市场篮子分析,我们发现检测到的重要变量中最重要的是 hippurate,与前一天食用的食物中某些维生素和矿物质的高水平有关,这表明 hippurate 的增加与各种蔬菜和水果的摄入之间存在关系。因此,本文描述的基于 KPCA 的分析方法使我们能够捕捉输入-输出响应,不仅对代谢组学而且对生物和环境系统的其他领域的分析都应该是有用的。