Computational Systems Biology Group, Children's Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, NSW 2145, Australia.
School of Mathematics and Statistics, Faculty of Science, University of Sydney, NSW 2006, Australia.
Bioinformatics. 2023 Jun 1;39(6). doi: 10.1093/bioinformatics/btad382.
Recent advances in multimodal single-cell omics technologies enable multiple modalities of molecular attributes, such as gene expression, chromatin accessibility, and protein abundance, to be profiled simultaneously at a global level in individual cells. While the increasing availability of multiple data modalities is expected to provide a more accurate clustering and characterization of cells, the development of computational methods that are capable of extracting information embedded across data modalities is still in its infancy.
We propose SnapCCESS for clustering cells by integrating data modalities in multimodal single-cell omics data using an unsupervised ensemble deep learning framework. By creating snapshots of embeddings of multimodality using variational autoencoders, SnapCCESS can be coupled with various clustering algorithms for generating consensus clustering of cells. We applied SnapCCESS with several clustering algorithms to various datasets generated from popular multimodal single-cell omics technologies. Our results demonstrate that SnapCCESS is effective and more efficient than conventional ensemble deep learning-based clustering methods and outperforms other state-of-the-art multimodal embedding generation methods in integrating data modalities for clustering cells. The improved clustering of cells from SnapCCESS will pave the way for more accurate characterization of cell identity and types, an essential step for various downstream analyses of multimodal single-cell omics data.
SnapCCESS is implemented as a Python package and is freely available from https://github.com/PYangLab/SnapCCESS under the open-source license of GPL-3. The data used in this study are publicly available (see section 'Data availability').
多模态单细胞组学技术的最新进展使得在单个细胞中能够同时对分子属性(如基因表达、染色质可及性和蛋白质丰度)的多个模态进行全局分析。尽管越来越多的多模态数据有望提供更准确的细胞聚类和特征描述,但能够提取跨数据模态嵌入信息的计算方法的发展仍处于起步阶段。
我们提出了 SnapCCESS,它使用无监督集成深度学习框架,通过整合多模态单细胞组学数据中的数据模态来对细胞进行聚类。通过使用变分自动编码器创建多模态嵌入的快照,SnapCCESS 可以与各种聚类算法结合使用,以生成细胞的共识聚类。我们将 SnapCCESS 与几种聚类算法应用于从流行的多模态单细胞组学技术生成的各种数据集。我们的结果表明,SnapCCESS 比传统的基于集成深度学习的聚类方法更有效,并且在整合数据模态以对细胞进行聚类方面优于其他最先进的多模态嵌入生成方法。SnapCCESS 对细胞的聚类改进将为更准确地描述细胞身份和类型铺平道路,这是对多模态单细胞组学数据进行各种下游分析的重要步骤。
SnapCCESS 作为一个 Python 包实现,可在 GPL-3 开源许可证下从 https://github.com/PYangLab/SnapCCESS 免费获得。本研究中使用的数据是公开可用的(见“数据可用性”部分)。