Unit of Cell and Molecular Biology, Dundee Dental School, University of Dundee, Dundee, UK.
Department of Oral Surgery, Medicine and Pathology, Dundee Dental School, University of Dundee, Dundee, UK.
J Oral Pathol Med. 2021 Apr;50(4):378-384. doi: 10.1111/jop.13135. Epub 2020 Dec 15.
BACKGROUND/AIM: Machine learning analyses of cancer outcomes for oral cancer remain sparse compared to other types of cancer like breast or lung. The purpose of the present study was to compare the performance of machine learning algorithms in the prediction of global, recurrence-free five-year survival in oral cancer patients based on clinical and histopathological data.
Data were gathered retrospectively from 416 patients with oral squamous cell carcinoma. The data set was divided into training and test data set (75:25 split). Training performance of five machine learning algorithms (Logistic regression, K-nearest neighbours, Naïve Bayes, Decision tree and Random forest classifiers) for prediction was assessed by k-fold cross-validation. Variables used in the machine learning models were age, sex, pain symptoms, grade of lesion, lymphovascular invasion, extracapsular extension, perineural invasion, bone invasion and type of treatment. Variable importance was assessed and model performance on the testing data was assessed using receiver operating characteristic curves, accuracy, sensitivity, specificity and F1 score.
The best performing model was the Decision tree classifier, followed by the Logistic Regression model (accuracy 76% and 60%, respectively). The Naïve Bayes model did not display any predictive value with 0% specificity.
Machine learning presents a promising and accessible toolset for improving prediction of oral cancer outcomes. Our findings add to a growing body of evidence that Decision tree models are useful in models in predicting OSCC outcomes. We would advise that future similar studies explore a variety of machine learning models including Logistic regression to help evaluate model performance.
背景/目的:与乳腺癌、肺癌等其他类型的癌症相比,口腔癌的癌症结局的机器学习分析仍然很少。本研究的目的是比较机器学习算法在基于临床和组织病理学数据预测口腔癌患者的总体、无复发生存五年方面的性能。
数据从 416 名口腔鳞状细胞癌患者中回顾性收集。数据集分为训练数据集和测试数据集(75:25 分割)。通过 k 折交叉验证评估五种机器学习算法(逻辑回归、K-最近邻、朴素贝叶斯、决策树和随机森林分类器)对预测的训练性能。用于机器学习模型的变量为年龄、性别、疼痛症状、病变程度、淋巴血管侵犯、囊外扩展、神经周围侵犯、骨侵犯和治疗类型。评估了变量的重要性,并使用接收者操作特征曲线、准确性、敏感性、特异性和 F1 评分评估了测试数据上的模型性能。
表现最好的模型是决策树分类器,其次是逻辑回归模型(准确性分别为 76%和 60%)。朴素贝叶斯模型没有显示任何预测值,特异性为 0%。
机器学习为改善口腔癌结局的预测提供了一个有前途且易于使用的工具集。我们的发现增加了越来越多的证据,表明决策树模型在预测 OSCC 结局方面非常有用。我们建议未来的类似研究探索各种机器学习模型,包括逻辑回归,以帮助评估模型性能。