Biomolecular Medicine, Department of Surgery and Cancer, Faculty of Medicine, Imperial College London, Sir Alexander Fleming Building, South Kensington, London SW7 2AZ, UK.
Anal Chim Acta. 2011 Oct 31;705(1-2):72-80. doi: 10.1016/j.aca.2011.04.016. Epub 2011 Apr 20.
Linear multivariate projection methods are frequently applied for predictive modeling of spectroscopic data in metabonomic studies. The OPLS method is a commonly used computational procedure for characterizing spectral metabonomic data, largely due to its favorable model interpretation properties providing separate descriptions of predictive variation and response-orthogonal structured noise. However, when the relationship between descriptor variables and the response is non-linear, conventional linear models will perform sub-optimally. In this study we have evaluated to what extent a non-linear model, kernel-based orthogonal projections to latent structures (K-OPLS), can provide enhanced predictive performance compared to the linear OPLS model. Just like its linear counterpart, K-OPLS provides separate model components for predictive variation and response-orthogonal structured noise. The improved model interpretation by this separate modeling is a property unique to K-OPLS in comparison to other kernel-based models. Simulated annealing (SA) was used for effective and automated optimization of the kernel-function parameter in K-OPLS (SA-K-OPLS). Our results reveal that the non-linear K-OPLS model provides improved prediction performance in three separate metabonomic data sets compared to the linear OPLS model. We also demonstrate how response-orthogonal K-OPLS components provide valuable biological interpretation of model and data. The metabonomic data sets were acquired using proton Nuclear Magnetic Resonance (NMR) spectroscopy, and include a study of the liver toxin galactosamine, a study of the nephrotoxin mercuric chloride and a study of Trypanosoma brucei brucei infection. Automated and user-friendly procedures for the kernel-optimization have been incorporated into version 1.1.1 of the freely available K-OPLS software package for both R and Matlab to enable easy application of K-OPLS for non-linear prediction modeling.
线性多元投影方法常用于代谢组学中预测建模光谱数据。OPLS 方法是一种常用的计算程序,用于描述光谱代谢组学数据,主要是因为它具有有利的模型解释特性,提供了预测变化和响应正交结构噪声的单独描述。然而,当描述变量与响应之间的关系是非线性时,传统的线性模型将表现不佳。在这项研究中,我们评估了非线性模型——基于核的正交投影到潜在结构(K-OPLS)在多大程度上可以提供比线性 OPLS 模型更好的预测性能。与线性 OPLS 模型类似,K-OPLS 为预测变化和响应正交结构噪声提供了单独的模型组件。通过这种单独建模提供的改进模型解释是 K-OPLS 与其他基于核的模型相比的独特属性。模拟退火(SA)用于有效和自动优化 K-OPLS 中的核函数参数(SA-K-OPLS)。我们的结果表明,与线性 OPLS 模型相比,非线性 K-OPLS 模型在三个独立的代谢组学数据集提供了更好的预测性能。我们还演示了如何响应正交 K-OPLS 组件为模型和数据提供有价值的生物学解释。代谢组学数据集是使用质子核磁共振(NMR)光谱获得的,包括对肝毒素半乳糖胺的研究、对肾毒素氯化汞的研究和对布氏锥虫布鲁斯感染的研究。已将用于核优化的自动和用户友好的程序集成到免费提供的 K-OPLS 软件包的版本 1.1.1 中,用于 R 和 Matlab,以便于将 K-OPLS 轻松应用于非线性预测建模。