Valdes Gilmer, Solberg Timothy D, Heskel Marina, Ungar Lyle, Simone Charles B
Department of Radiation Oncology, Perelman Center for Advance Medicine, University of Pennsylvania, Philadelphia, PA, USA.
Phys Med Biol. 2016 Aug 21;61(16):6105-20. doi: 10.1088/0031-9155/61/16/6105. Epub 2016 Jul 27.
To develop a patient-specific 'big data' clinical decision tool to predict pneumonitis in stage I non-small cell lung cancer (NSCLC) patients after stereotactic body radiation therapy (SBRT). 61 features were recorded for 201 consecutive patients with stage I NSCLC treated with SBRT, in whom 8 (4.0%) developed radiation pneumonitis. Pneumonitis thresholds were found for each feature individually using decision stumps. The performance of three different algorithms (Decision Trees, Random Forests, RUSBoost) was evaluated. Learning curves were developed and the training error analyzed and compared to the testing error in order to evaluate the factors needed to obtain a cross-validated error smaller than 0.1. These included the addition of new features, increasing the complexity of the algorithm and enlarging the sample size and number of events. In the univariate analysis, the most important feature selected was the diffusion capacity of the lung for carbon monoxide (DLCO adj%). On multivariate analysis, the three most important features selected were the dose to 15 cc of the heart, dose to 4 cc of the trachea or bronchus, and race. Higher accuracy could be achieved if the RUSBoost algorithm was used with regularization. To predict radiation pneumonitis within an error smaller than 10%, we estimate that a sample size of 800 patients is required. Clinically relevant thresholds that put patients at risk of developing radiation pneumonitis were determined in a cohort of 201 stage I NSCLC patients treated with SBRT. The consistency of these thresholds can provide radiation oncologists with an estimate of their reliability and may inform treatment planning and patient counseling. The accuracy of the classification is limited by the number of patients in the study and not by the features gathered or the complexity of the algorithm.
开发一种针对个体患者的“大数据”临床决策工具,以预测立体定向体部放射治疗(SBRT)后I期非小细胞肺癌(NSCLC)患者的肺炎。记录了201例接受SBRT治疗的I期NSCLC患者的61项特征,其中8例(4.0%)发生放射性肺炎。使用决策树桩分别为每个特征找到肺炎阈值。评估了三种不同算法(决策树、随机森林、RUSBoost)的性能。绘制学习曲线并分析训练误差,并与测试误差进行比较,以评估获得小于0.1的交叉验证误差所需的因素。这些因素包括添加新特征、增加算法复杂度、扩大样本量和事件数量。在单变量分析中,选择的最重要特征是肺一氧化碳弥散量(DLCO adj%)。在多变量分析中,选择的三个最重要特征是心脏15 cc体积的受量、气管或支气管4 cc体积的受量以及种族。如果使用带正则化的RUSBoost算法,可以实现更高的准确性。为了在误差小于10%的情况下预测放射性肺炎,我们估计需要800例患者的样本量。在一组201例接受SBRT治疗的I期NSCLC患者中确定了使患者有发生放射性肺炎风险的临床相关阈值。这些阈值的一致性可以为放射肿瘤学家提供其可靠性的估计,并可能为治疗计划和患者咨询提供参考。分类的准确性受研究中患者数量的限制,而非所收集的特征或算法的复杂性。