Haddad Isam, Hiller Karsten, Frimmersdorf Eliane, Benkert Beatrice, Schomburg Dietmar, Jahn Dieter
Technische Universität Braunschweig, Institute of Microbiology, Braunschweig, Germany.
In Silico Biol. 2009;9(4):163-78.
Modern high-throughput techniques allow for the identification and quantification of hundreds of metabolites ofa biological system which cover central parts of the metabolome. Due to the amount and complexity of obtained data there is an increasing need for the development of appropriate computational interpretation methods. A novel data analysis pipeline designed for high-throughput determined metabolomic data is presented. The combination of principal component analysis (PCA) with emergent self-organizing maps (ESOM) and hierarchical cluster analysis (HCA)algorithms is used to unravel the structure underlying metabolomic data sets, including the detection of outliers. Observed differences between various analyzed metabolomes are automatically mapped and visualized using KEGG metabolic pathway maps. This way typical metabolic biomarker for data sets from various analyzed growth conditions and genetic backgrounds become visible. In order to validate the described methods we analyzed time resolved metabolomic datasets obtained for Corynebacterium glutamicum cells grown on various carbon sources consisting of 126 different metabolic patterns. The analysis pipeline was implemented in the user-friendly Java software eSOMet. The software was successfully used for the clustering of the metabolome data mentioned above. Metabolic biomarkers typical for the utilized carbon sources and analyzed growth phases were identified.
现代高通量技术能够对生物系统中的数百种代谢物进行鉴定和定量分析,这些代谢物涵盖了代谢组的核心部分。由于所获数据的数量和复杂性,对开发合适的计算解释方法的需求日益增加。本文提出了一种专为高通量测定的代谢组学数据设计的新型数据分析流程。主成分分析(PCA)与涌现自组织映射(ESOM)和层次聚类分析(HCA)算法相结合,用于揭示代谢组学数据集背后的结构,包括异常值的检测。使用KEGG代谢途径图自动映射和可视化各种分析代谢组之间观察到的差异。通过这种方式,来自各种分析生长条件和遗传背景的数据集的典型代谢生物标志物变得可见。为了验证所描述的方法,我们分析了从在各种碳源上生长的谷氨酸棒杆菌细胞获得的时间分辨代谢组学数据集,这些数据集由126种不同的代谢模式组成。该分析流程在用户友好的Java软件eSOMet中实现。该软件已成功用于上述代谢组数据的聚类分析。鉴定出了所利用碳源和分析生长阶段特有的代谢生物标志物。