Kilicarslan Serhat, Adem Kemal, Celik Mete
Gaziosmanpasa University, Department of Informatics, 60250 Tokat, Turkey.
Aksaray University, Department of Management Information Systems, 68100 Aksaray, Turkey.
Med Hypotheses. 2020 Apr;137:109577. doi: 10.1016/j.mehy.2020.109577. Epub 2020 Jan 20.
Machine learning and deep learning methods aims to discover patterns out of datasets such as, microarray data and medical data. In recent years, the importance of producing microarray data from tissue and cell samples and analyzing these microarray data has increased. Machine learning and deep learning methods have been started to use in the diagnosis and classification of microarray data of cancer diseases. However, it is challenging to analyze microarray data due to the small number of sample size and high number of features of microarray data and in some cases some features may not be relevant with the classification. Because of this reason, studies in the literature focused on developing feature selection/dimension reduction techniques and classification algorithms to improve classification accuracy of the microarray data. This study proposes hybrid methods by using Relief and stacked autoencoder approaches for dimension reduction and support vector machines (SVM) and convolutional neural networks (CNN) for classification. In the study, three microarray datasets of Overian, Leukemia and Central Nervous System (CNS) were used. Ovarian dataset contains 253 samples, 15,154 genes and 2 classes, Leukemia dataset contains 72 samples, 7129 genes, and 2 classes and CNS dataset contains 60 samples, 7129 genes and 2 classes. Among the methods applied to the three microarray data, the best classification accuracy without dimension reduction was observed with SVM as 96.14% for ovarian dataset, 94.83% for leukemia dataset and 65% for CNS dataset. The proposed hybrid method ReliefF + CNN method outperformed other approaches. It gave 98.6%, 99.86% and 83.95% classification accuracy for the datasets of ovarian, leukemia, and CNS datasets, respectively. Results shows that dimension reduction methods improved the classification accuracy of the methods of SVM and CNN.
机器学习和深度学习方法旨在从诸如微阵列数据和医学数据等数据集中发现模式。近年来,从组织和细胞样本中生成微阵列数据并分析这些微阵列数据的重要性日益增加。机器学习和深度学习方法已开始用于癌症疾病微阵列数据的诊断和分类。然而,由于微阵列数据的样本量少且特征数量多,分析微阵列数据具有挑战性,并且在某些情况下,一些特征可能与分类无关。因此,文献中的研究集中在开发特征选择/降维技术和分类算法,以提高微阵列数据的分类准确性。本研究提出了混合方法,使用Relief和堆叠自动编码器方法进行降维,并使用支持向量机(SVM)和卷积神经网络(CNN)进行分类。在该研究中,使用了卵巢癌、白血病和中枢神经系统(CNS)的三个微阵列数据集。卵巢癌数据集包含253个样本、15154个基因和2个类别,白血病数据集包含72个样本、7129个基因和2个类别,CNS数据集包含60个样本、7129个基因和2个类别。在应用于这三个微阵列数据的方法中,未进行降维时,SVM在卵巢癌数据集上的最佳分类准确率为96.14%,在白血病数据集上为94.83%,在CNS数据集上为65%。所提出的混合方法ReliefF + CNN方法优于其他方法。它在卵巢癌、白血病和CNS数据集上的分类准确率分别为98.6%、99.86%和83.95%。结果表明,降维方法提高了SVM和CNN方法的分类准确率。