Samara Kamel A, Al Aghbari Zaher, Abusafia Amani
College of Medicine, University of Sharjah, Sharjah, United Arab Emirates.
Department of Computer Science, University of Sharjah, Sharjah, United Arab Emirates.
Health Inf Sci Syst. 2021 Jan 12;9(1):5. doi: 10.1007/s13755-020-00134-4. eCollection 2021 Dec.
Glioblastoma is one of the most common and aggressive brain tumors in the world with a poor prognosis. A glioblastoma prognostication model has the potential to improve the cancer's standard of care. No other paper has looked at using ensemble learning with a population database to predict multiple binary glioblastoma survival outcomes.
We utilized ensemble learning to design, build, and test a prognostication system for glioblastoma for short-, intermediate- and long-term survival, based on various clinical features. We used the population database SEER which covers 17 different registries. The most important prognostic features were identified and used as a clinical feature set. The statistical feature set was determined using Random Forests. The accuracy, sensitivity, specificity, area under the receiver operating characteristic (AUROC), positive predictive value (PPV), and negative predictive value (NPV) were reported.
Statistically-determined feature sets had the best performance. All the top models for short, intermediate, and long-term survival were random forests. With regards to short-term survival, top model had metrics AUROC = 0.937, accuracy = 86%, specificity = 88%, sensitivity = 85%, NPV = 85%, and PPV = 87%. For long-term survival, the top model had AUROC = 0.893, accuracy = 81%, specificity = 79%, sensitivity = 83%, NPV = 82%, and PPV = 79%. The top intermediate-term survival prediction had AUROC 0.780 and the other metrics were at least 70%.
Our ensemble models were high-performing and achieved AUROCs as high as 0.94, highlighting the importance of balancing, using ensemble techniques and statistical feature selection. Our models can potentially be used by clinicians after external validation.
胶质母细胞瘤是世界上最常见且侵袭性最强的脑肿瘤之一,预后较差。胶质母细胞瘤预后模型有潜力改善癌症的治疗标准。尚无其他论文研究过使用集成学习和人群数据库来预测胶质母细胞瘤的多个二元生存结局。
我们利用集成学习,基于各种临床特征,设计、构建并测试了一个用于胶质母细胞瘤短期、中期和长期生存的预后系统。我们使用了涵盖17个不同登记处的人群数据库SEER。确定了最重要的预后特征并将其用作临床特征集。使用随机森林确定统计特征集。报告了准确性、敏感性、特异性、受试者操作特征曲线下面积(AUROC)、阳性预测值(PPV)和阴性预测值(NPV)。
统计确定的特征集表现最佳。短期、中期和长期生存的所有顶级模型均为随机森林。对于短期生存,顶级模型的指标为AUROC = 0.937,准确性 = 86%,特异性 = 88%,敏感性 = 85%,NPV = 85%,PPV = 87%。对于长期生存,顶级模型的AUROC = 0.893,准确性 = 81%,特异性 = 79%,敏感性 = 83%,NPV = 82%,PPV = 79%。中期生存预测的顶级模型AUROC为0.780,其他指标至少为70%。
我们的集成模型性能良好,AUROC高达0.94,突出了平衡、使用集成技术和统计特征选择的重要性。经外部验证后,我们的模型可能可供临床医生使用。