Guo Lan, Ma Yan, Ward Rebecca, Castranova Vince, Shi Xianglin, Qian Yong
Mary Babb Randolph Cancer Center, Department of Community Medicine, West Virginia University, Morgantown, West Virginia 26506-9300, USA.
Clin Cancer Res. 2006 Jun 1;12(11 Pt 1):3344-54. doi: 10.1158/1078-0432.CCR-05-2336.
Individualized therapy of lung adenocarcinoma depends on the accurate classification of patients into subgroups of poor and good prognosis, which reflects a different probability of disease recurrence and survival following therapy. However, it is currently impossible to reliably identify specific high-risk patients. Here, we propose a computational model system which accurately predicts the clinical outcome of individual patients based on their gene expression profiles.
Gene signatures were selected using feature selection algorithms random forests, correlation-based feature selection, and gain ratio attribute selection. Prediction models were built using random committee and Bayesian belief networks. The prognostic power of the survival predictors was also evaluated using hierarchical cluster analysis and Kaplan-Meier analysis.
The predictive accuracy of an identified 37-gene survival signature is 0.96 as measured by the area under the time-dependent receiver operating curves. The cluster analysis, using the 37-gene signature, aggregates the patient samples into three groups with distinct prognoses (Kaplan-Meier analysis, P < 0.0005, log-rank test). All patients in cluster 1 were in stage I, with N0 lymph node status (no metastasis) and smaller tumor size (T1 or T2). Additionally, a 12-gene signature correctly predicts the stage of 94.2% of patients.
Our results show that the prediction models based on the expression levels of a small number of marker genes could accurately predict patient outcome for individualized therapy of lung adenocarcinoma. Such an individualized treatment may significantly increase survival due to the optimization of treatment procedures and improve lung cancer survival every year through the 5-year checkpoint.
肺腺癌的个体化治疗依赖于将患者准确分类为预后不良和良好的亚组,这反映了治疗后疾病复发和生存的不同概率。然而,目前无法可靠地识别特定的高危患者。在此,我们提出一种计算模型系统,该系统可根据个体患者的基因表达谱准确预测其临床结局。
使用特征选择算法随机森林、基于相关性的特征选择和增益比属性选择来选择基因特征。使用随机委员会和贝叶斯信念网络构建预测模型。还使用层次聚类分析和Kaplan-Meier分析评估生存预测因子的预后能力。
通过时间依赖性受试者工作曲线下面积测量,所确定的37个基因的生存特征的预测准确率为0.96。使用37个基因的特征进行聚类分析,将患者样本聚为三组,其预后明显不同(Kaplan-Meier分析,P<0.0005,对数秩检验)。第1组的所有患者均处于I期,N0淋巴结状态(无转移)且肿瘤大小较小(T1或T2)。此外,一个12个基因的特征能正确预测94.2%患者的分期。
我们的结果表明,基于少数标记基因表达水平的预测模型可以准确预测肺腺癌个体化治疗的患者结局。这种个体化治疗可能会由于治疗程序的优化而显著提高生存率,并在5年的检查点期间每年提高肺癌患者的生存率。