Department of Radiation Oncology, Washington University School of Medicine, Saint Louis, MO 63110, USA.
Acta Oncol. 2010 Nov;49(8):1363-73. doi: 10.3109/02841861003649224. Epub 2010 Mar 2.
Tumor control probability (TCP) to radiotherapy is determined by complex interactions between tumor biology, tumor microenvironment, radiation dosimetry, and patient-related variables. The complexity of these heterogeneous variable interactions constitutes a challenge for building predictive models for routine clinical practice. We describe a datamining framework that can unravel the higher order relationships among dosimetric dose-volume prognostic variables, interrogate various radiobiological processes, and generalize to unseen data before when applied prospectively.
Several datamining approaches are discussed that include dose-volume metrics, equivalent uniform dose, mechanistic Poisson model, and model building methods using statistical regression and machine learning techniques. Institutional datasets of non-small cell lung cancer (NSCLC) patients are used to demonstrate these methods. The performance of the different methods was evaluated using bivariate Spearman rank correlations (rs). Over-fitting was controlled via resampling methods.
Using a dataset of 56 patients with primary NCSLC tumors and 23 candidate variables, we estimated GTV volume and V75 to be the best model parameters for predicting TCP using statistical resampling and a logistic model. Using these variables, the support vector machine (SVM) kernel method provided superior performance for TCP prediction with an rs=0.68 on leave-one-out testing compared to logistic regression (rs=0.4), Poisson-based TCP (rs=0.33), and cell kill equivalent uniform dose model (rs=0.17).
The prediction of treatment response can be improved by utilizing datamining approaches, which are able to unravel important non-linear complex interactions among model variables and have the capacity to predict on unseen data for prospective clinical applications.
肿瘤放疗控制概率(TCP)取决于肿瘤生物学、肿瘤微环境、辐射剂量学和患者相关变量之间的复杂相互作用。这些异质变量相互作用的复杂性构成了为常规临床实践构建预测模型的挑战。我们描述了一个数据挖掘框架,该框架可以揭示剂量-体积预测变量之间的高阶关系,探讨各种放射生物学过程,并在前瞻性应用时推广到未见数据。
讨论了几种数据挖掘方法,包括剂量-体积指标、等效均匀剂量、机械泊松模型以及使用统计回归和机器学习技术的模型构建方法。使用非小细胞肺癌(NSCLC)患者的机构数据集来演示这些方法。使用双变量 Spearman 秩相关系数(rs)评估不同方法的性能。通过重采样方法控制过拟合。
使用 56 例原发性 NSCLC 肿瘤患者和 23 个候选变量的数据集,我们使用统计重采样和逻辑模型估计 GTV 体积和 V75 是预测 TCP 的最佳模型参数。使用这些变量,支持向量机(SVM)核方法在留一法测试中提供了卓越的 TCP 预测性能,rs=0.68,而逻辑回归(rs=0.4)、基于泊松的 TCP(rs=0.33)和细胞杀伤等效均匀剂量模型(rs=0.17)。
通过利用数据挖掘方法可以提高治疗反应的预测,这些方法能够揭示模型变量之间重要的非线性复杂相互作用,并具有预测前瞻性临床应用中未见数据的能力。