Suppr超能文献

一种无参数深度嵌入聚类方法,用于单细胞 RNA-seq 数据。

A parameter-free deep embedded clustering method for single-cell RNA-seq data.

机构信息

School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510000, China.

Key Laboratory of Machine Intelligence and Advanced Computing (MOE), Guangzhou 510000, China.

出版信息

Brief Bioinform. 2022 Sep 20;23(5). doi: 10.1093/bib/bbac172.

Abstract

Clustering analysis is widely used in single-cell ribonucleic acid (RNA)-sequencing (scRNA-seq) data to discover cell heterogeneity and cell states. While many clustering methods have been developed for scRNA-seq analysis, most of these methods require to provide the number of clusters. However, it is not easy to know the exact number of cell types in advance, and experienced determination is not always reliable. Here, we have developed ADClust, an automatic deep embedding clustering method for scRNA-seq data, which can accurately cluster cells without requiring a predefined number of clusters. Specifically, ADClust first obtains low-dimensional representation through pre-trained autoencoder and uses the representations to cluster cells into initial micro-clusters. The clusters are then compared in between by a statistical test, and similar micro-clusters are merged into larger clusters. According to the clustering, cell representations are updated so that each cell will be pulled toward centers of its assigned cluster and similar clusters, while cells are separated to keep distances between clusters. This is accomplished through jointly optimizing the carefully designed clustering and autoencoder loss functions. This merging process continues until convergence. ADClust was tested on 11 real scRNA-seq datasets and was shown to outperform existing methods in terms of both clustering performance and the accuracy on the number of the determined clusters. More importantly, our model provides high speed and scalability for large datasets.

摘要

聚类分析在单细胞核糖核酸(RNA)测序(scRNA-seq)数据分析中被广泛用于发现细胞异质性和细胞状态。虽然已经开发了许多用于 scRNA-seq 分析的聚类方法,但这些方法大多需要提供聚类的数量。然而,预先知道确切的细胞类型数量并不容易,经验判断并不总是可靠的。在这里,我们开发了 ADClust,一种用于 scRNA-seq 数据的自动深度嵌入聚类方法,它可以在不需要预定义聚类数量的情况下准确地对细胞进行聚类。具体来说,ADClust 首先通过预训练的自动编码器获得低维表示,并使用这些表示将细胞聚类成初始微聚类。然后通过统计检验比较这些聚类,将相似的微聚类合并成更大的聚类。根据聚类情况,更新细胞表示,使每个细胞被拉向其分配的聚类和相似聚类的中心,同时细胞被分离以保持聚类之间的距离。这是通过联合优化精心设计的聚类和自动编码器损失函数来实现的。这个合并过程会一直持续到收敛。ADClust 在 11 个真实的 scRNA-seq 数据集上进行了测试,在聚类性能和确定的聚类数量的准确性方面都优于现有方法。更重要的是,我们的模型为大型数据集提供了高速和可扩展性。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验