The Center for Bioinformatics and The institute of Biomedical Sciences, School of Life Sciences, East China Normal University, Shanghai, 200241, China.
BMC Genomics. 2011 Dec 23;12 Suppl 5(Suppl 5):S3. doi: 10.1186/1471-2164-12-S5-S3.
Microarray data have been used for gene signature selection to predict clinical outcomes. Many studies have attempted to identify factors that affect models' performance with only little success. Fine-tuning of model parameters and optimizing each step of the modeling process often results in over-fitting problems without improving performance.
We propose a quantitative measurement, termed consistency degree, to detect the correlation between disease endpoint and gene expression profile. Different endpoints were shown to have different consistency degrees to gene expression profiles. The validity of this measurement to estimate the consistency was tested with significance at a p-value less than 2.2e-16 for all of the studied endpoints. According to the consistency degree score, overall survival milestone outcome of multiple myeloma was proposed to extend from 730 days to 1561 days, which is more consistent with gene expression profile.
For various clinical endpoints, the maximum predictive powers of different microarray-based models are limited by the correlation between endpoint and gene expression profile of disease samples as indicated by the consistency degree score. In addition, previous defined clinical outcomes can also be reassessed and refined more coherent according to related disease gene expression profile. Our findings point to an entirely new direction for assessing the microarray-based predictive models and provide important information to gene signature based clinical applications.
微阵列数据已被用于基因特征选择,以预测临床结果。许多研究试图确定影响模型性能的因素,但收效甚微。微调模型参数和优化建模过程的每一步通常会导致过拟合问题,而不会提高性能。
我们提出了一种定量测量方法,称为一致性程度,用于检测疾病终点与基因表达谱之间的相关性。不同的终点与基因表达谱的相关性不同。该测量方法的有效性通过所有研究终点的 p 值小于 2.2e-16 来测试。根据一致性程度评分,我们提出将多发性骨髓瘤的总生存期里程碑结果从 730 天延长至 1561 天,这与基因表达谱更一致。
对于各种临床终点,不同基于微阵列的模型的最大预测能力受到终点与疾病样本基因表达谱之间相关性的限制,如一致性程度评分所示。此外,以前定义的临床结局也可以根据相关疾病基因表达谱进行重新评估和更一致的细化。我们的研究结果为评估基于微阵列的预测模型指明了一个全新的方向,并为基于基因特征的临床应用提供了重要信息。