Shah Asghar Ali, Alturise Fahad, Alkhalifah Tamim, Khan Yaser Daanial
Department of Computer Science, University of Management and Technology, Lahore, Pakistan.
Department of Computer Sciences, Bahria University Lahore Campus, Lahore, Pakistan.
Digit Health. 2022 Oct 22;8:20552076221133703. doi: 10.1177/20552076221133703. eCollection 2022 Jan-Dec.
The abnormal growth of human healthy cells is called cancer. One of the major types of cancer is sarcoma, mostly found in human bones and soft tissue cells. It commonly occurs in children. According to a survey of the United States of America, there are more than 17,000 sarcoma patients registered each year which is 15% of all cancer cases. Recognition of cancer at its early stage saves many lives. The proposed study developed a framework for the early detection of human sarcoma cancer using deep learning Recurrent Neural Network (RNN) algorithms. The DNA of a human cell is made up of 25,000 to 30,000 genes. Each gene is represented by sequences of nucleotides. The nucleotides in a sequence of a driver gene can change which is termed as mutations. Some mutations can cause cancer. There are seven types of a gene whose mutation causes sarcoma cancer. The study uses the dataset which has been taken from more than 134 samples and includes 141 mutations in 8 driver genes. On these gene sequences RNN algorithms Long and Short-Term Memory (LSTM), Gated Recurrent Units and Bi-directional LSTM (Bi-LSTM) are used for training. Rigorous testing techniques such as Self-consistency testing, independent set testing, 10-fold cross-validation test are applied for the validation of results. These validation techniques yield several metrics such as Area Under the Curve (AUC), sensitivity, specificity, Mathew's correlation coefficient, loss, and accuracy. The proposed algorithm exhibits an accuracy of 99.6% with an AUC value of 1.00.
人类健康细胞的异常生长被称为癌症。癌症的主要类型之一是肉瘤,多见于人体骨骼和软组织细胞中。它常见于儿童。根据美国的一项调查,每年有超过17000名肉瘤患者登记在册,占所有癌症病例的15%。早期发现癌症能挽救许多生命。本研究提出了一种使用深度学习递归神经网络(RNN)算法早期检测人类肉瘤癌症的框架。人类细胞的DNA由25000至30000个基因组成。每个基因由核苷酸序列表示。驱动基因序列中的核苷酸会发生变化,这被称为突变。一些突变会导致癌症。有七种基因的突变会引发肉瘤癌症。该研究使用了来自134多个样本的数据集,其中包括8个驱动基因中的141个突变。在这些基因序列上使用RNN算法长短期记忆(LSTM)、门控循环单元和双向LSTM(Bi-LSTM)进行训练。采用了严格的测试技术如自一致性测试、独立集测试、10折交叉验证测试来验证结果。这些验证技术产生了几个指标,如曲线下面积(AUC)、敏感性、特异性、马修斯相关系数、损失和准确率等指标。所提出的算法准确率为99.6%,AUC值为1.00。