El Naqa Issam, Bradley Jeffrey D, Lindsay Patricia E, Hope Andrew J, Deasy Joseph O
Washington University, Saint Louis, MO, USA.
Phys Med Biol. 2009 Sep 21;54(18):S9-S30. doi: 10.1088/0031-9155/54/18/S02. Epub 2009 Aug 18.
Radiotherapy outcomes are determined by complex interactions between treatment, anatomical and patient-related variables. A common obstacle to building maximally predictive outcome models for clinical practice is the failure to capture potential complexity of heterogeneous variable interactions and applicability beyond institutional data. We describe a statistical learning methodology that can automatically screen for nonlinear relations among prognostic variables and generalize to unseen data before. In this work, several types of linear and nonlinear kernels to generate interaction terms and approximate the treatment-response function are evaluated. Examples of institutional datasets of esophagitis, pneumonitis and xerostomia endpoints were used. Furthermore, an independent RTOG dataset was used for 'generalizabilty' validation. We formulated the discrimination between risk groups as a supervised learning problem. The distribution of patient groups was initially analyzed using principle components analysis (PCA) to uncover potential nonlinear behavior. The performance of the different methods was evaluated using bivariate correlations and actuarial analysis. Over-fitting was controlled via cross-validation resampling. Our results suggest that a modified support vector machine (SVM) kernel method provided superior performance on leave-one-out testing compared to logistic regression and neural networks in cases where the data exhibited nonlinear behavior on PCA. For instance, in prediction of esophagitis and pneumonitis endpoints, which exhibited nonlinear behavior on PCA, the method provided 21% and 60% improvements, respectively. Furthermore, evaluation on the independent pneumonitis RTOG dataset demonstrated good generalizabilty beyond institutional data in contrast with other models. This indicates that the prediction of treatment response can be improved by utilizing nonlinear kernel methods for discovering important nonlinear interactions among model variables. These models have the capacity to predict on unseen data.
放射治疗结果由治疗、解剖学和患者相关变量之间的复杂相互作用决定。为临床实践构建最大预测性结果模型的一个常见障碍是未能捕捉异质变量相互作用的潜在复杂性以及机构数据之外的适用性。我们描述了一种统计学习方法,该方法可以自动筛选预后变量之间的非线性关系,并推广到之前未见过的数据。在这项工作中,评估了几种用于生成交互项并近似治疗反应函数的线性和非线性核。使用了食管炎、肺炎和口干症终点的机构数据集示例。此外,一个独立的放射肿瘤学组(RTOG)数据集用于“可推广性”验证。我们将风险组之间的区分表述为一个监督学习问题。最初使用主成分分析(PCA)分析患者组的分布,以发现潜在的非线性行为。使用双变量相关性和精算分析评估不同方法的性能。通过交叉验证重采样控制过拟合。我们的结果表明,在数据在PCA上表现出非线性行为的情况下,与逻辑回归和神经网络相比,改进的支持向量机(SVM)核方法在留一法测试中提供了更好的性能。例如,在预测在PCA上表现出非线性行为的食管炎和肺炎终点时,该方法分别提高了21%和60%。此外,与其他模型相比,在独立的肺炎RTOG数据集上的评估表明,该方法在机构数据之外具有良好的可推广性。这表明通过利用非线性核方法发现模型变量之间重要的非线性相互作用,可以改善治疗反应的预测。这些模型有能力对未见过的数据进行预测。