Department of Hematology and Medical Oncology, Emory University, Winship Cancer Institute, 1365 Clifton Road NE, Rm C-3090, Atlanta, GA 30322, USA.
Biomed Eng Online. 2011 Nov 8;10:97. doi: 10.1186/1475-925X-10-97.
Statistical learning (SL) techniques can address non-linear relationships and small datasets but do not provide an output that has an epidemiologic interpretation.
A small set of clinical variables (CVs) for stage-1 non-small cell lung cancer patients was used to evaluate an approach for using SL methods as a preprocessing step for survival analysis. A stochastic method of training a probabilistic neural network (PNN) was used with differential evolution (DE) optimization. Survival scores were derived stochastically by combining CVs with the PNN. Patients (n = 151) were dichotomized into favorable (n = 92) and unfavorable (n = 59) survival outcome groups. These PNN derived scores were used with logistic regression (LR) modeling to predict favorable survival outcome and were integrated into the survival analysis (i.e. Kaplan-Meier analysis and Cox regression). The hybrid modeling was compared with the respective modeling using raw CVs. The area under the receiver operating characteristic curve (Az) was used to compare model predictive capability. Odds ratios (ORs) and hazard ratios (HRs) were used to compare disease associations with 95% confidence intervals (CIs).
The LR model with the best predictive capability gave Az = 0.703. While controlling for gender and tumor grade, the OR = 0.63 (CI: 0.43, 0.91) per standard deviation (SD) increase in age indicates increasing age confers unfavorable outcome. The hybrid LR model gave Az = 0.778 by combining age and tumor grade with the PNN and controlling for gender. The PNN score and age translate inversely with respect to risk. The OR = 0.27 (CI: 0.14, 0.53) per SD increase in PNN score indicates those patients with decreased score confer unfavorable outcome. The tumor grade adjusted hazard for patients above the median age compared with those below the median was HR = 1.78 (CI: 1.06, 3.02), whereas the hazard for those patients below the median PNN score compared to those above the median was HR = 4.0 (CI: 2.13, 7.14).
We have provided preliminary evidence showing that the SL preprocessing may provide benefits in comparison with accepted approaches. The work will require further evaluation with varying datasets to confirm these findings.
统计学习(SL)技术可以解决非线性关系和小数据集的问题,但无法提供具有流行病学解释的输出。
使用一组小的临床变量(CVs)评估了一种使用 SL 方法作为生存分析预处理步骤的方法。使用差分进化(DE)优化训练概率神经网络(PNN)的随机方法。通过将 CVs 与 PNN 相结合,可以随机得出生存得分。将 151 名患者分为预后良好(n=92)和预后不良(n=59)两组。使用逻辑回归(LR)模型对这些 PNN 衍生的评分进行建模,以预测预后良好的生存结果,并将其整合到生存分析中(即 Kaplan-Meier 分析和 Cox 回归)。混合模型与使用原始 CVs 的相应模型进行了比较。使用受试者工作特征曲线下的面积(Az)来比较模型预测能力。使用比值比(ORs)和风险比(HRs)来比较疾病与 95%置信区间(CI)的关联。
具有最佳预测能力的 LR 模型的 Az 值为 0.703。在控制性别和肿瘤分级的情况下,年龄每增加一个标准差(SD),OR=0.63(CI:0.43,0.91),表明年龄增加会导致预后不良。通过结合年龄和肿瘤分级以及控制性别,将 PNN 和年龄与 LR 混合模型结合使用,Az 值为 0.778。PNN 评分和年龄与风险呈反比关系。PNN 评分每增加一个 SD,OR=0.27(CI:0.14,0.53),表明评分降低的患者预后不良。与年龄中位数以下的患者相比,年龄中位数以上的患者的肿瘤分级调整后的危险比(HR)为 1.78(CI:1.06,3.02),而年龄中位数以上的患者与年龄中位数以下的患者相比,PNN 评分较高的患者的危险比(HR)为 4.0(CI:2.13,7.14)。
我们已经提供了初步证据,表明与公认的方法相比,SL 预处理可能会带来好处。需要进一步使用不同的数据集进行评估,以确认这些发现。