Center for Quantitative Biology, Peking University, Beijing 100871, China.
Huawei Technologies Co., Ltd., Beijing 100080, China.
Bioinformatics. 2023 Jan 1;39(1). doi: 10.1093/bioinformatics/btac736.
Single-cell multi-omics sequencing techniques have rapidly developed in the past few years. Clustering analysis with single-cell multi-omics data may give us novel perspectives to dissect cellular heterogeneity. However, multi-omics data have the properties of inherited large dimension, high sparsity and existence of doublets. Moreover, representations of different omics from even the same cell follow diverse distributions. Without proper distribution alignment techniques, clustering methods will encounter less separable clusters easily affected by less informative omics data.
We developed MoClust, a novel joint clustering framework that can be applied to several types of single-cell multi-omics data. A selective automatic doublet detection module that can identify and filter out doublets is introduced in the pretraining stage to improve data quality. Omics-specific autoencoders are introduced to characterize the multi-omics data. A contrastive learning way of distribution alignment is adopted to adaptively fuse omics representations into an omics-invariant representation. This novel way of alignment boosts the compactness and separableness of clusters, while accurately weighting the contribution of each omics to the clustering object. Extensive experiments, over both simulated and real multi-omics datasets, demonstrated the powerful alignment, doublet detection and clustering ability features of MoClust.
An implementation of MoClust is available from https://doi.org/10.5281/zenodo.7306504.
Supplementary data are available at Bioinformatics online.
单细胞多组学测序技术在过去几年中迅速发展。对单细胞多组学数据进行聚类分析可能会为我们剖析细胞异质性提供新的视角。然而,多组学数据具有固有维度大、高度稀疏和存在二聚体的特性。此外,即使来自同一细胞的不同组学的表示形式也遵循不同的分布。如果没有适当的分布对齐技术,聚类方法将很容易遇到聚类不易分离的情况,并且容易受到信息量较少的组学数据的影响。
我们开发了 MoClust,这是一种新的联合聚类框架,可应用于几种类型的单细胞多组学数据。在预处理阶段引入了一种选择性的自动二聚体检测模块,可以识别和过滤二聚体,以提高数据质量。引入了特定于组学的自动编码器来描述多组学数据。采用对比学习的分布对齐方式,自适应地将组学表示融合到组学不变表示中。这种新颖的对齐方式提高了聚类的紧凑性和可分离性,同时准确地加权每个组学对聚类对象的贡献。通过对模拟和真实多组学数据集进行广泛的实验,证明了 MoClust 的强大对齐、二聚体检测和聚类能力。
MoClust 的实现可从 https://doi.org/10.5281/zenodo.7306504 获得。
补充数据可在 Bioinformatics 在线获得。