Radboud University Nijmegen, Institute for Molecules and Materials, Heyendaalseweg, The Netherlands.
BMC Bioinformatics. 2011 Jun 22;12:254. doi: 10.1186/1471-2105-12-254.
Analysis of Cerebrospinal Fluid (CSF) samples holds great promise to diagnose neurological pathologies and gain insight into the molecular background of these pathologies. Proteomics and metabolomics methods provide invaluable information on the biomolecular content of CSF and thereby on the possible status of the central nervous system, including neurological pathologies. The combined information provides a more complete description of CSF content. Extracting the full combined information requires a combined analysis of different datasets i.e. fusion of the data.
A novel fusion method is presented and applied to proteomics and metabolomics data from a pre-clinical model of multiple sclerosis: an Experimental Autoimmune Encephalomyelitis (EAE) model in rats. The method follows a mid-level fusion architecture. The relevant information is extracted per platform using extended canonical variates analysis. The results are subsequently merged in order to be analyzed jointly. We find that the combined proteome and metabolome data allow for the efficient and reliable discrimination between healthy, peripherally inflamed rats, and rats at the onset of the EAE. The predicted accuracy reaches 89% on a test set. The important variables (metabolites and proteins) in this model are known to be linked to EAE and/or multiple sclerosis.
Fusion of proteomics and metabolomics data is possible. The main issues of high-dimensionality and missing values are overcome. The outcome leads to higher accuracy in prediction and more exhaustive description of the disease profile. The biological interpretation of the involved variables validates our fusion approach.
分析脑脊液(CSF)样本对于诊断神经病理学和深入了解这些病理学的分子背景具有巨大的潜力。蛋白质组学和代谢组学方法提供了关于 CSF 生物分子含量的宝贵信息,从而提供了中枢神经系统(包括神经病理学)的可能状态的信息。综合信息提供了对 CSF 内容的更完整描述。提取完整的综合信息需要对不同数据集进行联合分析,即数据融合。
提出了一种新的融合方法,并将其应用于多发性硬化症的临床前模型(大鼠实验性自身免疫性脑脊髓炎(EAE)模型)的蛋白质组学和代谢组学数据。该方法采用中级融合架构。使用扩展正则变量分析从每个平台提取相关信息。随后合并结果以便联合分析。我们发现,组合的蛋白质组学和代谢组学数据可以有效地可靠地区分健康、外周炎症的大鼠和 EAE 发病的大鼠。在测试集上的预测准确率达到 89%。该模型中的重要变量(代谢物和蛋白质)已知与 EAE 和/或多发性硬化症有关。
蛋白质组学和代谢组学数据的融合是可行的。克服了高维性和缺失值的主要问题。结果导致预测的准确性更高,对疾病谱的描述更详尽。所涉及变量的生物学解释验证了我们的融合方法。