Suppr超能文献

单细胞 RNA 测序数据的联合降维和聚类学习。

Joint learning dimension reduction and clustering of single-cell RNA-sequencing data.

机构信息

School of Computer Science and Technology, Xidian University, Xi'an, China.

出版信息

Bioinformatics. 2020 Jun 1;36(12):3825-3832. doi: 10.1093/bioinformatics/btaa231.

Abstract

MOTIVATION

Single-cell RNA-sequencing (scRNA-seq) profiles transcriptome of individual cells, which enables the discovery of cell types or subtypes by using unsupervised clustering. Current algorithms perform dimension reduction before cell clustering because of noises, high-dimensionality and linear inseparability of scRNA-seq data. However, independence of dimension reduction and clustering fails to fully characterize patterns in data, resulting in an undesirable performance.

RESULTS

In this study, we propose a flexible and accurate algorithm for scRNA-seq data by jointly learning dimension reduction and cell clustering (aka DRjCC), where dimension reduction is performed by projected matrix decomposition and cell type clustering by non-negative matrix factorization. We first formulate joint learning of dimension reduction and cell clustering into a constrained optimization problem and then derive the optimization rules. The advantage of DRjCC is that feature selection in dimension reduction is guided by cell clustering, significantly improving the performance of cell type discovery. Eleven scRNA-seq datasets are adopted to validate the performance of algorithms, where the number of single cells varies from 49 to 68 579 with the number of cell types ranging from 3 to 14. The experimental results demonstrate that DRjCC significantly outperforms 13 state-of-the-art methods in terms of various measurements on cell type clustering (on average 17.44% by improvement). Furthermore, DRjCC is efficient and robust across different scRNA-seq datasets from various tissues. The proposed model and methods provide an effective strategy to analyze scRNA-seq data.

AVAILABILITY AND IMPLEMENTATION

The software is coded using matlab, and is free available for academic https://github.com/xkmaxidian/DRjCC.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

单细胞 RNA 测序 (scRNA-seq) 对单个细胞的转录组进行测序,这使得通过无监督聚类发现细胞类型或亚型成为可能。由于噪声、scRNA-seq 数据的高维性和线性不可分离性,当前的算法在细胞聚类之前进行降维。然而,降维和聚类的独立性未能充分描述数据中的模式,导致性能不理想。

结果

在这项研究中,我们通过联合学习降维和细胞聚类(又名 DRjCC),提出了一种灵活而准确的 scRNA-seq 数据分析算法,其中降维通过投影矩阵分解完成,细胞类型聚类通过非负矩阵分解完成。我们首先将降维和细胞聚类的联合学习形式化一个约束优化问题,然后推导出优化规则。DRjCC 的优势在于降维中的特征选择由细胞聚类指导,这显著提高了细胞类型发现的性能。采用 11 个 scRNA-seq 数据集来验证算法的性能,其中单细胞的数量从 49 到 68579 不等,细胞类型的数量从 3 到 14 不等。实验结果表明,DRjCC 在细胞聚类的各种度量上(平均提高 17.44%)显著优于 13 种最先进的方法。此外,DRjCC 在来自不同组织的不同 scRNA-seq 数据集上具有高效性和鲁棒性。所提出的模型和方法为分析 scRNA-seq 数据提供了一种有效的策略。

可用性和实现

该软件使用 matlab 编写,可在学术上免费使用 https://github.com/xkmaxidian/DRjCC。

补充信息

补充数据可在 Bioinformatics 在线获取。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验