基于双相关降维网络的极限学习机用于单细胞 RNA-seq 数据聚类。

DCRELM: dual correlation reduction network-based extreme learning machine for single-cell RNA-seq data clustering.

机构信息

School of Computer Science and Software Engineering, University of Science and Technology Liaoning, Anshan, 114051, China.

出版信息

Sci Rep. 2024 Jun 12;14(1):13541. doi: 10.1038/s41598-024-64217-y.

DOI:10.1038/s41598-024-64217-y

PMID:38866896

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11169517/

Abstract

Single-cell ribonucleic acid sequencing (scRNA-seq) is a high-throughput genomic technique that is utilized to investigate single-cell transcriptomes. Cluster analysis can effectively reveal the heterogeneity and diversity of cells in scRNA-seq data, but existing clustering algorithms struggle with the inherent high dimensionality, noise, and sparsity of scRNA-seq data. To overcome these limitations, we propose a clustering algorithm: the Dual Correlation Reduction network-based Extreme Learning Machine (DCRELM). First, DCRELM obtains the low-dimensional and dense result features of scRNA-seq data in an extreme learning machine (ELM) random mapping space. Second, the ELM graph distortion module is employed to obtain a dual view of the resulting features, effectively enhancing their robustness. Third, the autoencoder fusion module is employed to learn the attributes and structural information of the resulting features, and merge these two types of information to generate consistent latent representations of these features. Fourth, the dual information reduction network is used to filter the redundant information and noise in the dual consistent latent representations. Last, a triplet self-supervised learning mechanism is utilized to further improve the clustering performance. Extensive experiments show that the DCRELM performs well in terms of clustering performance and robustness. The code is available at https://github.com/gaoqingyun-lucky/awesome-DCRELM .

摘要

单细胞核糖核酸测序 (scRNA-seq) 是一种高通量基因组技术，用于研究单细胞转录组。聚类分析可以有效地揭示 scRNA-seq 数据中细胞的异质性和多样性，但现有的聚类算法在处理 scRNA-seq 数据固有的高维性、噪声和稀疏性方面存在困难。为了克服这些限制，我们提出了一种聚类算法：基于双相关降维网络的极限学习机 (DCRELM)。首先，DCRELM 在极限学习机 (ELM) 随机映射空间中获取 scRNA-seq 数据的低维和密集的结果特征。其次，采用 ELM 图扭曲模块获取所得特征的双视图，有效增强其鲁棒性。第三，采用自动编码器融合模块学习所得特征的属性和结构信息，并融合这两种类型的信息，生成这些特征的一致潜在表示。第四，采用双信息降维网络滤除双一致潜在表示中的冗余信息和噪声。最后，采用三重自监督学习机制进一步提高聚类性能。广泛的实验表明，DCRELM 在聚类性能和鲁棒性方面表现良好。代码可在 https://github.com/gaoqingyun-lucky/awesome-DCRELM 获得。