Research Center of Modernization of Traditional Chinese Medicines, Central South University, Changsha, 410083, P. R. China.
Analyst. 2011 Mar 7;136(5):947-54. doi: 10.1039/c0an00383b. Epub 2010 Dec 15.
Large amounts of data from high-throughput metabolomics experiments have become commonly more and more complex, which brings a number of challenges to existing statistical modeling. Thus there is a need to develop a statistically efficient approach for mining the underlying metabolite information contained by metabolomics data under investigation. In this work, we provide a new strategy based on Monte Carlo cross validation coupled with the classification tree algorithm, which is termed as the MCTree approach. The MCTree approach inherently provides a feasible way to uncover the predictive structure of metabolomics data by the establishment of many cross-predictive models. With the help of the sample proximity matrix such obtained, it seems to be able to give some interesting insights into metabolomics data. Simultaneously, informative metabolites or potential biomarkers can be successfully discovered by means of variable importance ranking in the MCTree approach. Two real metabolomics datasets are finally used to demonstrate the performance of the proposed approach.
大量来自高通量代谢组学实验的数据变得越来越复杂,这给现有的统计建模带来了许多挑战。因此,需要开发一种统计上有效的方法来挖掘研究中代谢组学数据中包含的潜在代谢物信息。在这项工作中,我们提供了一种基于蒙特卡罗交叉验证和分类树算法的新策略,称为 MCTree 方法。MCTree 方法通过建立许多交叉预测模型,为揭示代谢组学数据的预测结构提供了一种可行的方法。借助于这样获得的样本接近度矩阵,它似乎能够为代谢组学数据提供一些有趣的见解。同时,通过 MCTree 方法中的变量重要性排名,可以成功地发现有信息的代谢物或潜在的生物标志物。最后,使用两个真实的代谢组学数据集来演示所提出方法的性能。