Center of Excellence in Artificial Intelligence (CoE-AI), Department of Computer Science, Bahria University, Islamabad, 04408, Pakistan.
Faculty of Resilience, Rabdan Academy, Abu Dhabi, United Arab Emirates.
BMC Med Inform Decis Mak. 2024 Jul 22;24(1):198. doi: 10.1186/s12911-024-02604-1.
Genes, expressed as sequences of nucleotides, are susceptible to mutations, some of which can lead to cancer. Machine learning and deep learning methods have emerged as vital tools in identifying mutations associated with cancer. Thyroid cancer ranks as the 5th most prevalent cancer in the USA, with thousands diagnosed annually. This paper presents an ensemble learning model leveraging deep learning techniques such as Long Short-Term Memory (LSTM), Gated Recurrent Units (GRUs), and Bi-directional LSTM (Bi-LSTM) to detect thyroid cancer mutations early. The model is trained on a dataset sourced from asia.ensembl.org and IntOGen.org, consisting of 633 samples with 969 mutations across 41 genes, collected from individuals of various demographics. Feature extraction encompasses techniques including Hahn moments, central moments, raw moments, and various matrix-based methods. Evaluation employs three testing methods: self-consistency test (SCT), independent set test (IST), and 10-fold cross-validation test (10-FCVT). The proposed ensemble learning model demonstrates promising performance, achieving 96% accuracy in the independent set test (IST). Statistical measures such as training accuracy, testing accuracy, recall, sensitivity, specificity, Mathew's Correlation Coefficient (MCC), loss, training accuracy, F1 Score, and Cohen's kappa are utilized for comprehensive evaluation.
基因表现为核苷酸序列,容易发生突变,其中一些突变可能导致癌症。机器学习和深度学习方法已成为识别与癌症相关突变的重要工具。甲状腺癌是美国第五大常见癌症,每年诊断出数千例。本文提出了一种基于集成学习的模型,利用深度学习技术,如长短期记忆(LSTM)、门控循环单元(GRU)和双向长短期记忆(Bi-LSTM),早期发现甲状腺癌突变。该模型在一个源自 asia.ensembl.org 和 IntOGen.org 的数据集上进行训练,该数据集包含来自不同人群的 633 个样本,涉及 41 个基因的 969 个突变。特征提取包括 Hahn 矩、中心矩、原始矩和各种基于矩阵的方法。评估采用三种测试方法:自我一致性测试(SCT)、独立集测试(IST)和 10 倍交叉验证测试(10-FCVT)。所提出的集成学习模型表现出良好的性能,在独立集测试(IST)中达到 96%的准确率。利用训练准确率、测试准确率、召回率、灵敏度、特异性、马修斯相关系数(MCC)、损失、训练准确率、F1 得分和科恩氏kappa 等统计指标进行全面评估。