Reber Brandon, Van Dijk Lisanne, Anderson Brian, Mohamed Abdallah Sherif Radwan, Fuller Clifton, Lai Stephen, Brock Kristy
Department of Imaging Physics, The University of Texas MD Anderson Cancer Center, Houston, Texas.
University of Groningen, Groningen, Netherlands.
Adv Radiat Oncol. 2022 Dec 27;8(4):101163. doi: 10.1016/j.adro.2022.101163. eCollection 2023 Jul-Aug.
Deep-learning (DL) techniques have been successful in disease-prediction tasks and could improve the prediction of mandible osteoradionecrosis (ORN) resulting from head and neck cancer (HNC) radiation therapy. In this study, we retrospectively compared the performance of DL algorithms and traditional machine-learning (ML) techniques to predict mandible ORN binary outcome in an extensive cohort of patients with HNC.
Patients who received HNC radiation therapy at the University of Texas MD Anderson Cancer Center from 2005 to 2015 were identified for the ML (n = 1259) and DL (n = 1236) studies. The subjects were followed for ORN development for at least 12 months, with 173 developing ORN and 1086 having no evidence of ORN. The ML models used dose-volume histogram parameters to predict ORN development. These models included logistic regression, random forest, support vector machine, and a random classifier reference. The DL models were based on ResNet, DenseNet, and autoencoder-based architectures. The DL models used each participant's dose cropped to the mandible. The effect of increasing the amount of available training data on the DL models' prediction performance was evaluated by training the DL models using increasing ratios of the original training data.
The F1 score for the logistic regression model, the best-performing ML model, was 0.3. The best-performing ResNet, DenseNet, and autoencoder-based models had F1 scores of 0.07, 0.14, and 0.23, respectively, whereas the random classifier's F1 score was 0.17. No performance increase was apparent when we increased the amount of training data available for DL model training.
The ML models had superior performance to their DL counterparts. The lack of improvement in DL performance with increased training data suggests that either more data are needed for appropriate DL model construction or that the image features used in DL models are not suitable for this task.
深度学习(DL)技术在疾病预测任务中取得了成功,并且可以改善对头颈部癌(HNC)放射治疗导致的下颌骨放射性骨坏死(ORN)的预测。在本研究中,我们回顾性比较了DL算法和传统机器学习(ML)技术在大量HNC患者队列中预测下颌骨ORN二元结局的性能。
确定2005年至2015年在德克萨斯大学MD安德森癌症中心接受HNC放射治疗的患者用于ML(n = 1259)和DL(n = 1236)研究。对受试者进行至少12个月的ORN发生情况随访,其中173例发生ORN,1086例无ORN证据。ML模型使用剂量体积直方图参数来预测ORN的发生。这些模型包括逻辑回归、随机森林、支持向量机和随机分类器参考。DL模型基于ResNet、DenseNet和基于自动编码器的架构。DL模型使用裁剪到下颌骨的每个参与者的剂量。通过使用原始训练数据的递增比例训练DL模型,评估增加可用训练数据量对DL模型预测性能的影响。
表现最佳的ML模型逻辑回归模型的F1分数为0.3。表现最佳的基于ResNet、DenseNet和自动编码器的模型的F1分数分别为0.07、0.14和0.23,而随机分类器的F1分数为0.17。当我们增加可用于DL模型训练的训练数据量时,未观察到性能提升。
ML模型的性能优于DL模型。随着训练数据增加,DL性能未得到改善,这表明要么构建合适的DL模型需要更多数据,要么DL模型中使用的图像特征不适用于此任务。