Department of Biostatistics, University of Texas MD Anderson Cancer Center, Houston, TX 77230, USA.
J Natl Cancer Inst. 2013 Sep 4;105(17):1284-91. doi: 10.1093/jnci/djt202. Epub 2013 Aug 20.
Methods using cell line microarray and drug sensitivity data to predict patients' chemotherapy response are appealing, but groups may be reluctant to release details to preserve intellectual property. Here we describe a case study to validate predictions while treating the methods as a "black box."
Medical Prognosis Institute (MPI) constructed cell-line-derived sensitivity scores (SSs) and combined scores (CSs) that incorporate clinical variables. MD Anderson researchers evaluated their predictions. We searched the Gene Expression Omnibus (GEO) to identify validation datasets, and we performed statistical evaluation of the agreement between prediction and clinical observation.
We identified 3 suitable datasets: GSE16446 (n = 120; binary outcome), GSE17920 (n = 130; binary outcome), and GSE10255 (n = 161; continuous and time-to-event outcomes). The SS was statistically significantly associated with primary treatment responses for all studies (GSE16446: P = .02; GSE17920: P = .02; GSE10255: P = .02). Dichotomized SSs performed no better than chance for GSE16446 and GSE17920, and categorized SSs did not predict disease-free survival (GSE10255). SSs sometimes improved on predictions using clinical variables (GSE16446: P = .05; GSE17920: P = .31; GSE10255: P = .045), but gains were limited (95% confidence intervals for GSE16446 and GSE17920 include 0). The CS did not predict treatment response for GSE16446 (P = .55), but it did for GSE17920 (P < .001). Coefficients of clinical variables provided by MPI for CSs agree with estimates for GSE17920 better than estimates for GSE16446.
Model predictions were better than chance in all three datasets. However, these scores added little to existing clinical predictors; statistically significant contributions were likely to be too small to change clinical practice. These findings suggest that discovering better predictors will require both cell line data and a clinical training dataset of patient samples.
使用细胞系微阵列和药物敏感性数据来预测患者化疗反应的方法很有吸引力,但研究小组可能不愿意公布细节以保护知识产权。在这里,我们描述了一个案例研究,在将方法视为“黑箱”的情况下验证预测。
医学预后研究所 (MPI) 构建了细胞系衍生的敏感性评分 (SS) 和包含临床变量的综合评分 (CS)。MD 安德森研究人员评估了他们的预测。我们在基因表达综合数据库 (GEO) 中搜索了验证数据集,并对预测与临床观察之间的一致性进行了统计评估。
我们确定了 3 个合适的数据集:GSE16446(n = 120;二项结局)、GSE17920(n = 130;二项结局)和 GSE10255(n = 161;连续和时间事件结局)。对于所有研究,SS 与原发性治疗反应均呈统计学显著相关(GSE16446:P =.02;GSE17920:P =.02;GSE10255:P =.02)。对于 GSE16446 和 GSE17920,二分类 SS 的表现并不优于随机,分类 SS 也不能预测无病生存期(GSE10255)。SS 有时可以改善使用临床变量的预测(GSE16446:P =.05;GSE17920:P =.31;GSE10255:P =.045),但增益有限(GSE16446 和 GSE17920 的 95%置信区间包括 0)。CS 不能预测 GSE16446 的治疗反应(P =.55),但能预测 GSE17920 的治疗反应(P <.001)。MPI 为 CS 提供的临床变量系数与 GSE17920 的估计值更一致,而与 GSE16446 的估计值不一致。
在所有三个数据集,模型预测均优于随机预测。然而,这些评分对现有临床预测指标的贡献很小;统计学上的显著贡献可能太小,无法改变临床实践。这些发现表明,发现更好的预测指标将需要细胞系数据和患者样本的临床训练数据集。