Hedjazi Lyamine, Le Lann Marie-Veronique, Kempowsky Tatiana, Dalenc Florence, Aguilar-Martin Joseph, Favre Gilles
CNRS, LAAS, Toulouse, France.
J Comput Biol. 2013 Aug;20(8):610-20. doi: 10.1089/cmb.2012.0249.
Microarray profiling has recently generated the hope to gain new insights into breast cancer biology and thereby improve the performance of current prognostic tools. However, it also poses several serious challenges to classical data analysis techniques related to the characteristics of resulting data, mainly high dimensionality and low signal-to-noise ratio. Despite the tremendous research work performed to handle the first challenge in the feature selection framework, very little attention has been directed to address the second one. We propose in this article to address both issues simultaneously based on symbolic data analysis capabilities in order to derive more accurate genetic marker-based prognostic models. In particular, interval data representation is employed to model various uncertainties in microarray measurements. A recent feature selection algorithm that handles symbolic interval data is used then to derive a genetic signature. The predictive value of the derived signature is then assessed by following a rigorous experimental setup and compared with existing prognostic approaches in terms of predictive performance and estimated survival probability. It is shown that the derived signature (GenSym) performs significantly better than other prognostic models, including the 70-gene signature, St. Gallen, and National Institutes of Health criteria.
微阵列分析最近带来了新的希望,即能够深入了解乳腺癌生物学,从而提高当前预后工具的性能。然而,它也给与所得数据特征相关的传统数据分析技术带来了几个严峻挑战,主要是高维度和低信噪比。尽管在特征选择框架中为应对第一个挑战开展了大量研究工作,但很少有人关注解决第二个挑战。在本文中,我们建议基于符号数据分析能力同时解决这两个问题,以便推导出更准确的基于基因标记的预后模型。特别是,采用区间数据表示法来对微阵列测量中的各种不确定性进行建模。然后使用一种处理符号区间数据的最新特征选择算法来得出一个基因特征。随后,通过遵循严格的实验设置来评估所得特征的预测价值,并在预测性能和估计生存概率方面与现有的预后方法进行比较。结果表明,所得特征(GenSym)的表现明显优于其他预后模型,包括70基因特征、圣加仑标准和美国国立卫生研究院标准。