School of Artificial Intelligence, Henan University, Kaifeng 475004, China.
School of Computer and Information Engineering, Henan University, Kaifeng 475004, China.
Sensors (Basel). 2021 May 23;21(11):3627. doi: 10.3390/s21113627.
Identifying the key genes related to tumors from gene expression data with a large number of features is important for the accurate classification of tumors and to make special treatment decisions. In recent years, unsupervised feature selection algorithms have attracted considerable attention in the field of gene selection as they can find the most discriminating subsets of genes, namely the potential information in biological data. Recent research also shows that maintaining the important structure of data is necessary for gene selection. However, most current feature selection methods merely capture the local structure of the original data while ignoring the importance of the global structure of the original data. We believe that the global structure and local structure of the original data are equally important, and so the selected genes should maintain the essential structure of the original data as far as possible. In this paper, we propose a new, adaptive, unsupervised feature selection scheme which not only reconstructs high-dimensional data into a low-dimensional space with the constraint of feature distance invariance but also employs ℓ2,1-norm to enable a matrix with the ability to perform gene selection embedding into the local manifold structure-learning framework. Moreover, an effective algorithm is developed to solve the optimization problem based on the proposed scheme. Comparative experiments with some classical schemes on real tumor datasets demonstrate the effectiveness of the proposed method.
从具有大量特征的基因表达数据中识别与肿瘤相关的关键基因,对于肿瘤的准确分类和制定特殊的治疗决策非常重要。近年来,无监督特征选择算法在基因选择领域引起了相当大的关注,因为它们可以找到最具判别力的基因子集,即生物数据中的潜在信息。最近的研究还表明,保持数据的重要结构对于基因选择是必要的。然而,大多数当前的特征选择方法仅仅捕获原始数据的局部结构,而忽略了原始数据的全局结构的重要性。我们认为原始数据的全局结构和局部结构同样重要,因此选择的基因应该尽可能地保持原始数据的基本结构。在本文中,我们提出了一种新的自适应无监督特征选择方案,该方案不仅可以在特征距离不变性的约束下将高维数据重构到低维空间,还可以利用 ℓ2,1 范数使具有基因选择能力的矩阵嵌入到局部流形结构学习框架中。此外,还开发了一种有效的算法来基于所提出的方案解决优化问题。在真实肿瘤数据集上与一些经典方案的比较实验证明了所提出方法的有效性。