University of Sulaimani, Collage of Science, Computer Department, Sulaymaniyah, Iraq.
Adv Exp Med Biol. 2021;1338:231-237. doi: 10.1007/978-3-030-78775-2_28.
The support vector machines (SVMs) is one of the machine learning algorithms with high classification accuracy. However, the support vector machine algorithm has a very high training complexity. Thus, it is not very efficient with large datasets. In this study, we have used the multi-class support vector machines and systematic sampling with hierarchical clustering (SSHC-MCSVM) algorithm for gene expression data classification. The gene expression profiles are considered as large datasets. The gene expression datasets that are used in this study are two datasets for obese and lean individuals. In this proposed (SSHC-MCSVM) algorithm, the gene expression data are regrouped to new sets of genes based on systematic sampling with hierarchical clustering (SSHC) algorithm. The SSHC algorithm repeated n times and the k-partitions with clusters that have high adjusted Rand index (ARI) are chosen. The multi-class support vector machines are applied to the best regrouped gene expression data to classify the significant genes. The performance measures are accuracy, recall, and precision. The proposed algorithm which is SSHC-MCSVM could classify the significant genes with high accuracy, recall, and precision.
支持向量机(SVMs)是一种具有高精度分类能力的机器学习算法。然而,支持向量机算法的训练复杂度非常高。因此,对于大型数据集来说,效率不是很高。在这项研究中,我们使用多类支持向量机和基于分层聚类的系统抽样(SSHC-MCSVM)算法对基因表达数据进行分类。基因表达谱被视为大型数据集。本研究中使用的基因表达数据集是两组肥胖和瘦个体的数据集。在提出的(SSHC-MCSVM)算法中,根据基于分层聚类的系统抽样(SSHC)算法将基因表达数据重新组合成新的基因集。SSHC 算法重复 n 次,并选择具有高调整兰德指数(ARI)的聚类的 k 个分区。多类支持向量机应用于最佳重组的基因表达数据,以对显著基因进行分类。性能度量包括准确性、召回率和精度。提出的 SSHC-MCSVM 算法可以以高精度、高召回率和高精度对显著基因进行分类。