School of Information Science and Engineering, Qufu Normal University, Rizhao, China.
Library of Qufu Normal University, Qufu Normal University, Rizhao, China.
Comput Biol Chem. 2020 Dec;89:107368. doi: 10.1016/j.compbiolchem.2020.107368. Epub 2020 Sep 1.
With the development of cancer research, various gene expression datasets containing cancer information show an explosive growth trend. In addition, due to the continuous maturity of single-cell RNA sequencing (scRNA-seq) technology, the protein information and pedigree information of a single cell are also continuously mined. It is a technical problem of how to classify these high-dimensional data correctly. In recent years, Extreme Learning Machine (ELM) has been widely used in the field of supervised learning and unsupervised learning. However, the traditional ELM does not consider the robustness of the method. To improve the robustness of ELM, in this paper, a novel ELM method based on L-norm named L-Extreme Learning Machine (L -ELM) has been proposed. The method introduces L-norm on loss function to improve the robustness, and minimizes the influence of noise and outliers. Firstly, we evaluate the new method on five UCI datasets. The experiment results prove that our method can achieve competitive results. Next, the novel method is applied to the problem of classification of cancer samples and single-cell RNA sequencing datasets. The experimental results on The Cancer Genome Atlas (TCGA) datasets and scRNA-seq datasets prove that ELM and its variants has great potential in the classification of cancer samples.
随着癌症研究的发展,包含癌症信息的各种基因表达数据集呈现出爆炸式增长趋势。此外,由于单细胞 RNA 测序(scRNA-seq)技术的不断成熟,单细胞的蛋白质信息和谱系信息也在不断挖掘。如何正确分类这些高维数据是一个技术问题。近年来,极限学习机(ELM)在监督学习和无监督学习领域得到了广泛应用。然而,传统的 ELM 并没有考虑方法的稳健性。为了提高 ELM 的稳健性,本文提出了一种基于 L-范数的新型 ELM 方法,称为 L-极限学习机(L-ELM)。该方法在损失函数中引入 L-范数来提高稳健性,最小化噪声和异常值的影响。首先,我们在五个 UCI 数据集上评估了新方法。实验结果证明了我们的方法可以取得有竞争力的结果。接下来,将该新方法应用于癌症样本和单细胞 RNA 测序数据集的分类问题。在 TCGA 数据集和 scRNA-seq 数据集上的实验结果证明,ELM 及其变体在癌症样本的分类中具有很大的潜力。