Dossat Nadège, Mangé Alain, Solassol Jérôme, Jacot William, Lhermitte Ludovic, Maudelonde Thierry, Daurès Jean-Pierre, Molinari Nicolas
IURC, Department of Biostatistic, Epidemiology and Clinical Research, Montpellier, France.
Cancer Inform. 2007 Jul 19;3:295-305.
A key challenge in clinical proteomics of cancer is the identification of biomarkers that could allow detection, diagnosis and prognosis of the diseases. Recent advances in mass spectrometry and proteomic instrumentations offer unique chance to rapidly identify these markers. These advances pose considerable challenges, similar to those created by microarray-based investigation, for the discovery of pattern of markers from high-dimensional data, specific to each pathologic state (e.g. normal vs cancer). We propose a three-step strategy to select important markers from high-dimensional mass spectrometry data using surface enhanced laser desorption/ionization (SELDI) technology. The first two steps are the selection of the most discriminating biomarkers with a construction of different classifiers. Finally, we compare and validate their performance and robustness using different supervised classification methods such as Support Vector Machine, Linear Discriminant Analysis, Quadratic Discriminant Analysis, Neural Networks, Classification Trees and Boosting Trees. We show that the proposed method is suitable for analysing high-throughput proteomics data and that the combination of logistic regression and Linear Discriminant Analysis outperform other methods tested.
癌症临床蛋白质组学中的一个关键挑战是识别能够实现疾病检测、诊断和预后的生物标志物。质谱技术和蛋白质组学仪器的最新进展为快速识别这些标志物提供了独特的机会。这些进展对于从特定于每种病理状态(例如正常组织与癌组织)的高维数据中发现标志物模式提出了相当大的挑战,这与基于微阵列的研究带来的挑战类似。我们提出了一种三步策略,使用表面增强激光解吸/电离(SELDI)技术从高维质谱数据中选择重要标志物。前两步是通过构建不同的分类器来选择最具区分性的生物标志物。最后,我们使用不同的监督分类方法(如支持向量机、线性判别分析、二次判别分析、神经网络、分类树和提升树)来比较和验证它们的性能和稳健性。我们表明,所提出的方法适用于分析高通量蛋白质组学数据,并且逻辑回归和线性判别分析的组合优于其他测试方法。