Department of Computer Science, College of Computer Science and Information Technology, Imam Abdulrahman Bin Faisal University, P.O. Box 1982, Dammam, 31441, Saudi Arabia.
Department of Computer Science, College of Computer Science and Information Technology, Imam Abdulrahman Bin Faisal University, P.O. Box 1982, Dammam, 31441, Saudi Arabia.
Comput Biol Med. 2021 Apr;131:104267. doi: 10.1016/j.compbiomed.2021.104267. Epub 2021 Feb 13.
In recent times, researchers have noticed that chronic diseases have become more common. In the Kingdom of Saudi Arabia, the number of patients with thyroid cancer (TC) has become a concern, necessitating a proactive system that can help cut down the incidence of this disease, where the system can assist in early interventions to prevent or cure the disease. In this paper, we introduce our work developing machine learning-based tools that can serve as early warning systems by detecting TC at very early stages (pre-symptomatic stage). In addition, we aimed at obtaining the greatest possible accuracy while using fewer features. It must be noted that while there have been past efforts to use machine learning in predicting TC, this is the first attempt using a Saudi Arabian dataset as well as targeting diagnosis in the pre-symptomatic stage (pre-emptive diagnosis). The techniques used in this work include random forest (RF), artificial neural network (ANN), support vector machine (SVM), and naïve Bayes (NB), each of which was selected for their unique capabilities. The highest accuracy rate obtained was 90.91% with the RF technique, while SVM, ANN, and NB achieved 84.09%, 88.64%, and 81.82% accuracy, respectively. These levels were obtained by using only seven features out of an available 15. Considering the pattern of the obtained results, it is clear that the RF technique is better and, hence, recommended for this specific problem.
近年来,研究人员注意到慢性疾病变得更为普遍。在沙特阿拉伯王国,甲状腺癌 (TC) 患者的数量引起了关注,这需要一个能够帮助降低这种疾病发病率的积极主动的系统,该系统可以协助早期干预,预防或治疗这种疾病。在本文中,我们介绍了我们开发基于机器学习的工具的工作,这些工具可以作为早期预警系统,在非常早期阶段 (症状前阶段) 检测 TC。此外,我们的目标是在使用较少特征的情况下获得尽可能高的准确性。必须指出的是,虽然过去已经有使用机器学习来预测 TC 的努力,但这是第一次尝试使用沙特阿拉伯数据集,并针对症状前阶段进行诊断(先发制人的诊断)。本工作中使用的技术包括随机森林 (RF)、人工神经网络 (ANN)、支持向量机 (SVM) 和朴素贝叶斯 (NB),每种技术都因其独特的功能而被选中。RF 技术获得的最高准确率为 90.91%,而 SVM、ANN 和 NB 的准确率分别为 84.09%、88.64%和 81.82%。这些结果是通过仅使用 15 个可用特征中的 7 个获得的。考虑到获得的结果模式,很明显 RF 技术更好,因此推荐用于这个特定的问题。