Department of Biomedical Engineering, Meybod University, Meybod, Iran.
Department of Embedded Systems Engineering, College of Information Technology, Incheon National University, Incheon, Korea.
IET Syst Biol. 2022 May;16(3-4):120-131. doi: 10.1049/syb2.12044. Epub 2022 Jul 4.
Malignancies and diseases of various genetic origins can be diagnosed and classified with microarray data. There are many obstacles to overcome due to the large size of the gene and the small number of samples in the microarray. A combination strategy for gene expression in a variety of diseases is described in this paper, consisting of two steps: identifying the most effective genes via soft ensembling and classifying them with a novel deep neural network. The feature selection approach combines three strategies to select wrapper genes and rank them according to the k-nearest neighbour algorithm, resulting in a very generalisable model with low error levels. Using soft ensembling, the most effective subsets of genes were identified from three microarray datasets of diffuse large cell lymphoma, leukaemia, and prostate cancer. A stacked deep neural network was used to classify all three datasets, achieving an average accuracy of 97.51%, 99.6%, and 96.34%, respectively. In addition, two previously unreported datasets from small, round blue cell tumors (SRBCTs)and multiple sclerosis-related brain tissue lesions were examined to show the generalisability of the model method.
可以使用微阵列数据诊断和分类各种遗传起源的恶性肿瘤和疾病。由于微阵列中的基因数量大,样本数量少,因此存在许多需要克服的障碍。本文描述了一种用于各种疾病基因表达的组合策略,包括两个步骤:通过软集成识别最有效的基因,并使用新型深度神经网络对其进行分类。特征选择方法结合了三种策略来选择包装基因,并根据 k-最近邻算法对其进行排名,从而得到一个具有低错误水平的非常通用的模型。使用软集成,从弥漫性大 B 细胞淋巴瘤、白血病和前列腺癌的三个微阵列数据集确定了最有效的基因子集。使用堆叠深度神经网络对所有三个数据集进行分类,平均准确率分别为 97.51%、99.6%和 96.34%。此外,还检查了两个来自小圆形蓝色细胞瘤(SRBCTs)和多发性硬化症相关脑组织病变的以前未报告的数据集,以展示模型方法的通用性。