Suppr超能文献

基于正则化高斯图模型的单细胞 RNA-Seq 数据聚类。

Clustering Single-Cell RNA-Seq Data with Regularized Gaussian Graphical Model.

机构信息

Department of Public Health Sciences, Pennsylvania State University College of Medicine, 500 University Drive, Hershey, PA 17033, USA.

出版信息

Genes (Basel). 2021 Feb 22;12(2):311. doi: 10.3390/genes12020311.

Abstract

Single-cell RNA-seq (scRNA-seq) is a powerful tool to measure the expression patterns of individual cells and discover heterogeneity and functional diversity among cell populations. Due to variability, it is challenging to analyze such data efficiently. Many clustering methods have been developed using at least one free parameter. Different choices for free parameters may lead to substantially different visualizations and clusters. Tuning free parameters is also time consuming. Thus there is need for a simple, robust, and efficient clustering method. In this paper, we propose a new regularized Gaussian graphical clustering (RGGC) method for scRNA-seq data. RGGC is based on high-order (partial) correlations and subspace learning, and is robust over a wide-range of a regularized parameter λ. Therefore, we can simply set λ=2 or λ=log(p) for AIC (Akaike information criterion) or BIC (Bayesian information criterion) without cross-validation. Cell subpopulations are discovered by the Louvain community detection algorithm that determines the number of clusters automatically. There is no free parameter to be tuned with RGGC. When evaluated with simulated and benchmark scRNA-seq data sets against widely used methods, RGGC is computationally efficient and one of the top performers. It can detect inter-sample cell heterogeneity, when applied to glioblastoma scRNA-seq data.

摘要

单细胞 RNA 测序 (scRNA-seq) 是一种强大的工具,可以测量单个细胞的表达模式,并发现细胞群体中的异质性和功能多样性。由于存在可变性,因此高效分析此类数据具有挑战性。已经开发了许多使用至少一个自由参数的聚类方法。自由参数的不同选择可能导致截然不同的可视化和聚类。调整自由参数也很耗时。因此,需要一种简单、稳健且高效的聚类方法。在本文中,我们提出了一种用于 scRNA-seq 数据的新正则化高斯图形聚类 (RGGC) 方法。RGGC 基于高阶(偏)相关和子空间学习,在广泛的正则化参数 λ 范围内具有稳健性。因此,我们可以简单地将 λ=2 或 λ=log(p) 设置为 AIC(赤池信息量准则)或 BIC(贝叶斯信息量准则),而无需进行交叉验证。细胞亚群通过 Louvain 社区检测算法发现,该算法自动确定聚类的数量。RGGC 没有要调整的自由参数。当使用模拟和基准 scRNA-seq 数据集与广泛使用的方法进行评估时,RGGC 计算效率高,是表现最好的方法之一。当应用于胶质母细胞瘤 scRNA-seq 数据时,它可以检测样本间的细胞异质性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9b52/7927011/3e62c27f3cf7/genes-12-00311-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验