Shanghai Key Lab of Intelligent Information Processing, Fudan University, Shanghai, China.
School of Computer Science and Technology, Fudan University, Shanghai, China.
BMC Bioinformatics. 2022 May 5;23(1):161. doi: 10.1186/s12859-022-04703-0.
With the development of modern sequencing technology, hundreds of thousands of single-cell RNA-sequencing (scRNA-seq) profiles allow to explore the heterogeneity in the cell level, but it faces the challenges of high dimensions and high sparsity. Dimensionality reduction is essential for downstream analysis, such as clustering to identify cell subpopulations. Usually, dimensionality reduction follows unsupervised approach.
In this paper, we introduce a semi-supervised dimensionality reduction method named scSemiAE, which is based on an autoencoder model. It transfers the information contained in available datasets with cell subpopulation labels to guide the search of better low-dimensional representations, which can ease further analysis.
Experiments on five public datasets show that, scSemiAE outperforms both unsupervised and semi-supervised baselines whether the transferred information embodied in the number of labeled cells and labeled cell subpopulations is much or less.
随着现代测序技术的发展,数以十万计的单细胞 RNA 测序(scRNA-seq)图谱允许在细胞水平上探索异质性,但它面临着高维数和高稀疏性的挑战。降维对于下游分析至关重要,例如聚类以识别细胞亚群。通常,降维遵循无监督方法。
在本文中,我们介绍了一种基于自动编码器模型的半监督降维方法,名为 scSemiAE。它将带有细胞亚群标签的可用数据集所包含的信息转移过来,以指导更好的低维表示的搜索,从而可以简化进一步的分析。
在五个公共数据集上的实验表明,无论转移的信息体现在标记细胞的数量和标记细胞亚群上,scSemiAE 都优于无监督和半监督基线。