Department of Computer Science, College of Computer and Information Sciences (CCIS), King Saud University, Riyadh 11543, Saudi Arabia.
Sensors (Basel). 2019 Sep 27;19(19):4207. doi: 10.3390/s19194207.
Genomic copy number variations (CNVs) are among the most important structural variations. They are linked to several diseases and cancer types. Cancer is a leading cause of death worldwide. Several studies were conducted to investigate the causes of cancer and its association with genomic changes to enhance its management and improve the treatment opportunities. Classification of cancer types based on the CNVs falls in this category of research. We reviewed the recent, most successful methods that used machine learning algorithms to solve this problem and obtained a dataset that was tested by some of these methods for evaluation and comparison purposes. We propose three deep learning techniques to classify cancer types based on CNVs: a six-layer convolutional net (CNN6), residual six-layer convolutional net (ResCNN6), and transfer learning of pretrained VGG16 net. The results of the experiments performed on the data of six cancer types demonstrated a high accuracy of 86% for ResCNN6 followed by 85% for CNN6 and 77% for VGG16. The results revealed a lower prediction accuracy for one of the classes (uterine corpus endometrial carcinoma (UCEC)). Repeating the experiments after excluding this class reveals improvements in the accuracies: 91% for CNN6 and 92% for Res CNN6. We observed that UCEC and ovarian serous carcinoma (OV) share a considerable subset of their features, which causes a struggle for learning in the classifiers. We repeated the experiment again by balancing the six classes through oversampling of the training dataset and the result was an enhancement in both overall and UCEC classification accuracies.
基因组拷贝数变异(CNVs)是最重要的结构变异之一。它们与多种疾病和癌症类型有关。癌症是全球主要的死亡原因。已经进行了几项研究来调查癌症的原因及其与基因组变化的关联,以加强其管理并改善治疗机会。基于 CNVs 的癌症类型分类属于这一研究范畴。我们回顾了最近使用机器学习算法解决此问题的最成功方法,并获得了一个数据集,这些方法中的一些方法对其进行了测试,以便进行评估和比较。我们提出了三种基于深度学习的技术来根据 CNVs 对癌症类型进行分类:一个六层卷积网络(CNN6)、残差六层卷积网络(ResCNN6)和预训练 VGG16 网络的迁移学习。在六种癌症类型的数据上进行的实验结果表明,ResCNN6 的准确率为 86%,其次是 CNN6 为 85%,VGG16 为 77%。结果表明,对于一个类别(子宫内膜癌(UCEC))的预测准确性较低。在排除该类别后重复实验表明,准确率有所提高:CNN6 为 91%,Res CNN6 为 92%。我们观察到 UCEC 和卵巢浆液性癌(OV)共享其特征的相当大部分,这使得分类器在学习方面面临困难。我们再次通过对训练数据集进行过采样来平衡六个类别,重复实验的结果是整体和 UCEC 分类准确率都得到了提高。