CHAI：通过相似性矩阵集成进行共识聚类，以进行细胞类型识别。

CHAI: consensus clustering through similarity matrix integration for cell-type identification.

机构信息

Integrative Life Sciences, Virginia Commonwealth University, Richmond, VA 23284, United States.

Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, United States.

出版信息

Brief Bioinform. 2024 Jul 25;25(5). doi: 10.1093/bib/bbae411.

DOI:10.1093/bib/bbae411

PMID:39207729

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11359802/

Abstract

Several methods have been developed to computationally predict cell-types for single cell RNA sequencing (scRNAseq) data. As methods are developed, a common problem for investigators has been identifying the best method they should apply to their specific use-case. To address this challenge, we present CHAI (consensus Clustering tHrough similArIty matrix integratIon for single cell-type identification), a wisdom of crowds approach for scRNAseq clustering. CHAI presents two competing methods which aggregate the clustering results from seven state-of-the-art clustering methods: CHAI-AvgSim and CHAI-SNF. CHAI-AvgSim and CHAI-SNF demonstrate superior performance across several benchmarking datasets. Furthermore, both CHAI methods outperform the most recent consensus clustering method, SAME-clustering. We demonstrate CHAI's practical use case by identifying a leader tumor cell cluster enriched with CDH3. CHAI provides a platform for multiomic integration, and we demonstrate CHAI-SNF to have improved performance when including spatial transcriptomics data. CHAI overcomes previous limitations by incorporating the most recent and top performing scRNAseq clustering algorithms into the aggregation framework. It is also an intuitive and easily customizable R package where users may add their own clustering methods to the pipeline, or down-select just the ones they want to use for the clustering aggregation. This ensures that as more advanced clustering algorithms are developed, CHAI will remain useful to the community as a generalized framework. CHAI is available as an open source R package on GitHub: https://github.com/lodimk2/chai.

摘要

已经开发出几种方法来计算单个细胞 RNA 测序 (scRNAseq) 数据的细胞类型。随着方法的不断发展，研究人员面临的一个共同问题是确定他们应该将哪种最佳方法应用于他们的特定用例。为了解决这个挑战，我们提出了 CHAI（通过相似性矩阵集成进行共识聚类的细胞类型识别），这是一种用于 scRNAseq 聚类的群体智慧方法。CHAI 提出了两种竞争性方法，它们可以聚合七种最先进的聚类方法的聚类结果：CHAI-AvgSim 和 CHAI-SNF。CHAI-AvgSim 和 CHAI-SNF 在几个基准数据集上表现出优越的性能。此外，这两种 CHAI 方法均优于最新的共识聚类方法 SAME-clustering。我们通过识别富含 CDH3 的优势肿瘤细胞簇来展示 CHAI 的实际用例。CHAI 提供了一个多组学整合的平台，我们还展示了当包括空间转录组学数据时，CHAI-SNF 的性能得到了提高。CHAI 通过将最新的和表现最佳的 scRNAseq 聚类算法纳入聚合框架来克服以前的局限性。它也是一个直观且易于定制的 R 包，用户可以在管道中添加自己的聚类方法，或者仅选择他们想要用于聚类聚合的方法。这确保了随着更先进的聚类算法的发展，CHAI 将作为一个通用框架继续为社区提供帮助。CHAI 可在 GitHub 上作为开源 R 包获得：https://github.com/lodimk2/chai。