Zou Wei, Tolstikov Vladimir V
UC Davis Genome Center, University of California, Davis, CA 95616, USA.
Rapid Commun Mass Spectrom. 2008 Apr;22(8):1312-24. doi: 10.1002/rcm.3507.
Six different clones of 1-year-old loblolly pine (Pinus taeda L.) seedlings grown under standardized conditions in a green house were used for sample preparation and further analysis. Three independent and complementary analytical techniques for metabolic profiling were applied in the present study: hydrophilic interaction chromatography (HILIC-LC/ESI-MS), reversed-phase liquid chromatography (RP-LC/ESI-MS), and gas chromatography all coupled to mass spectrometry (GC/TOF-MS). Unsupervised methods, such as principle component analysis (PCA) and clustering, and supervised methods, such as classification, were used for data mining. Genetic algorithms (GA), a multivariate approach, was probed for selection of the smallest subsets of potentially discriminative classifiers. From more than 2000 peaks found in total, small subsets were selected by GA as highly potential classifiers allowing discrimination among six investigated genotypes. Annotated GC/TOF-MS data allowed the generation of a small subset of identified metabolites. LC/ESI-MS data and small subsets require further annotation. The present study demonstrated that combination of comprehensive metabolic profiling and advanced data mining techniques provides a powerful metabolomic approach for biomarker discovery among small molecules. Utilizing GA for feature selection allowed the generation of small subsets of potent classifiers.
在温室标准化条件下培育的6个不同克隆的1年生火炬松(Pinus taeda L.)幼苗用于样本制备和进一步分析。本研究应用了三种独立且互补的代谢物谱分析技术:亲水作用色谱法(HILIC-LC/ESI-MS)、反相液相色谱法(RP-LC/ESI-MS)以及均与质谱联用的气相色谱法(GC/TOF-MS)。采用无监督方法,如主成分分析(PCA)和聚类分析,以及监督方法,如分类,进行数据挖掘。对多元方法遗传算法(GA)进行探索,以选择潜在判别分类器的最小子集。在总共发现的2000多个峰中,通过GA选择了作为高度潜在分类器的小子集,从而能够区分6种被研究的基因型。经注释的GC/TOF-MS数据能够生成一小部分已鉴定的代谢物。LC/ESI-MS数据和小子集需要进一步注释。本研究表明,综合代谢物谱分析和先进数据挖掘技术的结合为小分子生物标志物发现提供了一种强大的代谢组学方法。利用GA进行特征选择能够生成有效分类器的小子集。