Research Center of Modernization of Traditional Chinese Medicines, Central South University, Changsha, PR China.
Anal Chim Acta. 2011 Nov 7;706(1):97-104. doi: 10.1016/j.aca.2011.08.025. Epub 2011 Sep 12.
Large amounts of data from high-throughput metabolomics experiments become commonly more and more complex, which brings an enormous amount of challenges to existing statistical modeling. Thus there is a need to develop statistically efficient approach for mining the underlying metabolite information contained by metabolomics data under investigation. In the work, we developed a novel kernel Fisher discriminant analysis (KFDA) algorithm by constructing an informative kernel based on decision tree ensemble. The constructed kernel can effectively encode the similarities of metabolomics samples between informative metabolites/biomarkers in specific parts of the measurement space. Simultaneously, informative metabolites or potential biomarkers can be successfully discovered by variable importance ranking in the process of building kernel. Moreover, KFDA can also deal with nonlinear relationship in the metabolomics data by such a kernel to some extent. Finally, two real metabolomics datasets together with a simulated data were used to demonstrate the performance of the proposed approach through the comparison of different approaches.
大量来自高通量代谢组学实验的数据变得越来越复杂,这给现有的统计建模带来了巨大的挑战。因此,需要开发一种在统计上有效的方法来挖掘所研究的代谢组学数据中包含的潜在代谢物信息。在这项工作中,我们通过构建基于决策树集成的信息核,开发了一种新的核 Fisher 判别分析(KFDA)算法。所构建的核可以有效地编码代谢组学样本在测量空间特定区域内的信息代谢物/生物标志物之间的相似性。同时,通过构建核的过程中的变量重要性排序,可以成功地发现信息代谢物或潜在的生物标志物。此外,KFDA 还可以通过这种核在一定程度上处理代谢组学数据中的非线性关系。最后,通过比较不同的方法,使用两个真实的代谢组学数据集和一个模拟数据集来验证所提出方法的性能。