The D-lab: Decision Support for Precision Medicine, GROW - School for Oncology and Developmental Biology, Maastricht University Medical Centre+, Universiteitssingel 40, 6229 ER, Maastricht, The Netherlands.
Department of Radiation Oncology, GROW, School for Oncology and Developmental Biology, Maastricht University Medical Center, Maastricht, The Netherlands.
Med Phys. 2018 Jul;45(7):3449-3459. doi: 10.1002/mp.12967. Epub 2018 Jun 13.
Machine learning classification algorithms (classifiers) for prediction of treatment response are becoming more popular in radiotherapy literature. General Machine learning literature provides evidence in favor of some classifier families (random forest, support vector machine, gradient boosting) in terms of classification performance. The purpose of this study is to compare such classifiers specifically for (chemo)radiotherapy datasets and to estimate their average discriminative performance for radiation treatment outcome prediction.
We collected 12 datasets (3496 patients) from prior studies on post-(chemo)radiotherapy toxicity, survival, or tumor control with clinical, dosimetric, or blood biomarker features from multiple institutions and for different tumor sites, that is, (non-)small-cell lung cancer, head and neck cancer, and meningioma. Six common classification algorithms with built-in feature selection (decision tree, random forest, neural network, support vector machine, elastic net logistic regression, LogitBoost) were applied on each dataset using the popular open-source R package caret. The R code and documentation for the analysis are available online (https://github.com/timodeist/classifier_selection_code). All classifiers were run on each dataset in a 100-repeated nested fivefold cross-validation with hyperparameter tuning. Performance metrics (AUC, calibration slope and intercept, accuracy, Cohen's kappa, and Brier score) were computed. We ranked classifiers by AUC to determine which classifier is likely to also perform well in future studies. We simulated the benefit for potential investigators to select a certain classifier for a new dataset based on our study (pre-selection based on other datasets) or estimating the best classifier for a dataset (set-specific selection based on information from the new dataset) compared with uninformed classifier selection (random selection).
Random forest (best in 6/12 datasets) and elastic net logistic regression (best in 4/12 datasets) showed the overall best discrimination, but there was no single best classifier across datasets. Both classifiers had a median AUC rank of 2. Preselection and set-specific selection yielded a significant average AUC improvement of 0.02 and 0.02 over random selection with an average AUC rank improvement of 0.42 and 0.66, respectively.
Random forest and elastic net logistic regression yield higher discriminative performance in (chemo)radiotherapy outcome and toxicity prediction than other studied classifiers. Thus, one of these two classifiers should be the first choice for investigators when building classification models or to benchmark one's own modeling results against. Our results also show that an informed preselection of classifiers based on existing datasets can improve discrimination over random selection.
机器学习分类算法(分类器)在放射治疗文献中越来越受欢迎,用于预测治疗反应。一般的机器学习文献提供了一些分类器家族(随机森林、支持向量机、梯度提升)在分类性能方面的证据。本研究的目的是专门比较这些分类器,特别是对于(放化疗)数据集,并估计它们对放射治疗结果预测的平均判别性能。
我们从多个机构和不同肿瘤部位(非小细胞肺癌、头颈部癌症和脑膜瘤)的多个机构收集了 12 个数据集(3496 名患者),这些数据集包含临床、剂量学或血液生物标志物特征,用于研究放化疗后毒性、生存或肿瘤控制情况。我们使用流行的开源 R 包 caret 在每个数据集上应用了六种具有内置特征选择的常见分类算法(决策树、随机森林、神经网络、支持向量机、弹性网络逻辑回归、LogitBoost)。所有分类器都在每个数据集上进行了 100 次重复嵌套五折交叉验证和超参数调整。计算了性能指标(AUC、校准斜率和截距、准确性、Cohen's kappa 和 Brier 评分)。我们根据 AUC 对分类器进行排名,以确定哪种分类器在未来的研究中也可能表现良好。我们模拟了潜在研究者基于我们的研究(基于其他数据集的预选)选择特定分类器的好处,或者基于新数据集的信息估计数据集的最佳分类器(基于数据集的选择),而不是盲目选择分类器。
随机森林(在 6/12 个数据集中表现最好)和弹性网络逻辑回归(在 4/12 个数据集中表现最好)表现出总体最佳的判别能力,但没有一种分类器在所有数据集中都是最佳的。这两种分类器的中位数 AUC 排名均为 2。预选和特定于数据集的选择导致 AUC 的平均改善分别为 0.02 和 0.02,AUC 排名的平均改善分别为 0.42 和 0.66。
随机森林和弹性网络逻辑回归在(放化疗)治疗结果和毒性预测中的判别性能优于其他研究的分类器。因此,当研究人员构建分类模型或根据自己的建模结果进行基准测试时,这两种分类器中的一种应该是首选。我们的结果还表明,基于现有数据集的分类器的信息预选可以提高随机选择的判别能力。