Mallick Pradeep Kumar, Mohapatra Saumendra Kumar, Chae Gyoo-Soo, Mohanty Mihir Narayan
School of Computer Engineering, KIIT (Deemed to be University), Bhubaneswar, Odisha India.
Department of Computer Science and Engineering, ITER, Siksha 'O' Anusandhan (Deemed to be University), Bhubaneswar, Odisha India.
Pers Ubiquitous Comput. 2023;27(3):1103-1110. doi: 10.1007/s00779-020-01467-3. Epub 2020 Oct 16.
Microarray data analysis is a major challenging field of research in recent days. Machine learning-based automated gene data classification is an essential aspect for diagnosis of gene related any malfunctions and diseases. As the size of the data is very large, it is essential to design a suitable classifier that can process huge amount of data. Deep learning is one of the advanced machine learning techniques to mitigate these types of problems. Due the presence of more number of hidden layers, it can easily handle the big amount of data. We have presented a method of classification to understand the convergence of training deep neural network (DNN). The assumptions are taken as the inputs do not degenerate and the network is over-parameterized. Also the number of hidden neurons is sufficiently large. Authors in this piece of work have used DNN for classifying the gene expressions data. The dataset used in the work contains the bone marrow expressions of 72 leukemia patients. A five-layer DNN classifier is designed for classifying acute lymphocyte (ALL) and acute myelocytic (AML) samples. The network is trained with 80% data and rest 20% data is considered for validation purpose. Proposed DNN classifier is providing a satisfactory result as compared to other classifiers. Two types of leukemia are classified with 98.2% accuracy, 96.59% sensitivity, and 97.9% specificity. The different types of computer-aided analyses of genes can be helpful to genetic and virology researchers as well in future generation.
近年来,微阵列数据分析是一个极具挑战性的研究领域。基于机器学习的自动化基因数据分类是诊断与基因相关的任何故障和疾病的一个重要方面。由于数据量非常大,设计一个能够处理大量数据的合适分类器至关重要。深度学习是缓解这类问题的先进机器学习技术之一。由于存在更多数量的隐藏层,它能够轻松处理大量数据。我们提出了一种分类方法来理解深度神经网络(DNN)训练的收敛性。假设输入不会退化且网络参数化过度,并且隐藏神经元的数量足够大。该论文的作者使用DNN对基因表达数据进行分类。这项工作中使用的数据集包含72名白血病患者的骨髓表达。设计了一个五层DNN分类器来对急性淋巴细胞白血病(ALL)和急性髓细胞白血病(AML)样本进行分类。网络使用80%的数据进行训练,其余20%的数据用于验证。与其他分类器相比,所提出的DNN分类器提供了令人满意的结果。两种类型的白血病分类准确率为98.2%,灵敏度为96.59%,特异性为97.9%。未来,不同类型的基因计算机辅助分析对遗传学和病毒学研究人员也会有所帮助。