使用 MoClust 对单细胞多组学数据进行聚类。

Clustering single-cell multi-omics data with MoClust.

机构信息

Center for Quantitative Biology, Peking University, Beijing 100871, China.

Huawei Technologies Co., Ltd., Beijing 100080, China.

出版信息

Bioinformatics. 2023 Jan 1;39(1). doi: 10.1093/bioinformatics/btac736.

DOI:10.1093/bioinformatics/btac736

PMID:36383167

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9805570/

Abstract

MOTIVATION

Single-cell multi-omics sequencing techniques have rapidly developed in the past few years. Clustering analysis with single-cell multi-omics data may give us novel perspectives to dissect cellular heterogeneity. However, multi-omics data have the properties of inherited large dimension, high sparsity and existence of doublets. Moreover, representations of different omics from even the same cell follow diverse distributions. Without proper distribution alignment techniques, clustering methods will encounter less separable clusters easily affected by less informative omics data.

RESULTS

We developed MoClust, a novel joint clustering framework that can be applied to several types of single-cell multi-omics data. A selective automatic doublet detection module that can identify and filter out doublets is introduced in the pretraining stage to improve data quality. Omics-specific autoencoders are introduced to characterize the multi-omics data. A contrastive learning way of distribution alignment is adopted to adaptively fuse omics representations into an omics-invariant representation. This novel way of alignment boosts the compactness and separableness of clusters, while accurately weighting the contribution of each omics to the clustering object. Extensive experiments, over both simulated and real multi-omics datasets, demonstrated the powerful alignment, doublet detection and clustering ability features of MoClust.

AVAILABILITY AND IMPLEMENTATION

An implementation of MoClust is available from https://doi.org/10.5281/zenodo.7306504.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

单细胞多组学测序技术在过去几年中迅速发展。对单细胞多组学数据进行聚类分析可能会为我们剖析细胞异质性提供新的视角。然而，多组学数据具有固有维度大、高度稀疏和存在二聚体的特性。此外，即使来自同一细胞的不同组学的表示形式也遵循不同的分布。如果没有适当的分布对齐技术，聚类方法将很容易遇到聚类不易分离的情况，并且容易受到信息量较少的组学数据的影响。

结果

我们开发了 MoClust，这是一种新的联合聚类框架，可应用于几种类型的单细胞多组学数据。在预处理阶段引入了一种选择性的自动二聚体检测模块，可以识别和过滤二聚体，以提高数据质量。引入了特定于组学的自动编码器来描述多组学数据。采用对比学习的分布对齐方式，自适应地将组学表示融合到组学不变表示中。这种新颖的对齐方式提高了聚类的紧凑性和可分离性，同时准确地加权每个组学对聚类对象的贡献。通过对模拟和真实多组学数据集进行广泛的实验，证明了 MoClust 的强大对齐、二聚体检测和聚类能力。

可用性和实现

MoClust 的实现可从 https://doi.org/10.5281/zenodo.7306504 获得。

补充信息

补充数据可在 Bioinformatics 在线获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1ba3/9805570/29e1056cad32/btac736f1.jpg

相似文献

Clustering single-cell multi-omics data with MoClust.

Bioinformatics. 2023 Jan 1;39(1). doi: 10.1093/bioinformatics/btac736.

Deep cross-omics cycle attention model for joint analysis of single-cell multi-omics data.

Bioinformatics. 2021 Nov 18;37(22):4091-4099. doi: 10.1093/bioinformatics/btab403.

Con-AAE: contrastive cycle adversarial autoencoders for single-cell multi-omics alignment and integration.

Bioinformatics. 2023 Apr 3;39(4). doi: 10.1093/bioinformatics/btad162.

Clustering single-cell multi-omics data via graph regularized multi-view ensemble learning.

Bioinformatics. 2024 Mar 29;40(4). doi: 10.1093/bioinformatics/btae169.

scMCs: a framework for single-cell multi-omics data integration and multiple clusterings.

Bioinformatics. 2023 Apr 3;39(4). doi: 10.1093/bioinformatics/btad133.

scMIC: A Deep Multi-Level Information Fusion Framework for Clustering Single-Cell Multi-Omics Data.

IEEE J Biomed Health Inform. 2023 Dec;27(12):6121-6132. doi: 10.1109/JBHI.2023.3317272. Epub 2023 Dec 5.

PIntMF: Penalized Integrative Matrix Factorization method for multi-omics data.

Bioinformatics. 2022 Jan 27;38(4):900-907. doi: 10.1093/bioinformatics/btab786.

Dealing with dimensionality: the application of machine learning to multi-omics data.

Bioinformatics. 2023 Feb 3;39(2). doi: 10.1093/bioinformatics/btad021.

Subtype-DCC: decoupled contrastive clustering method for cancer subtype identification based on multi-omics data.

Brief Bioinform. 2023 Mar 19;24(2). doi: 10.1093/bib/bbad025.

Clustering CITE-seq data with a canonical correlation-based deep learning method.

Front Genet. 2022 Aug 22;13:977968. doi: 10.3389/fgene.2022.977968. eCollection 2022.

引用本文的文献

PLNMFG: Pseudo-label guided non-negative matrix factorization model with graph constraint for single-cell multi-omics data clustering.

PLoS Comput Biol. 2025 Aug 18;21(8):e1013375. doi: 10.1371/journal.pcbi.1013375. eCollection 2025 Aug.

scDRMAE: integrating masked autoencoder with residual attention networks to leverage omics feature dependencies for accurate cell clustering.

Bioinformatics. 2024 Oct 1;40(10). doi: 10.1093/bioinformatics/btae599.

scMNMF: a novel method for single-cell multi-omics clustering based on matrix factorization.

Brief Bioinform. 2024 Mar 27;25(3). doi: 10.1093/bib/bbae228.

Statistical and machine learning methods for immunoprofiling based on single-cell data.

Hum Vaccin Immunother. 2023 Aug 1;19(2):2234792. doi: 10.1080/21645515.2023.2234792. Epub 2023 Jul 24.

本文引用的文献

Multi-omics single-cell data integration and regulatory inference with graph-linked embedding.

Nat Biotechnol. 2022 Oct;40(10):1458-1466. doi: 10.1038/s41587-022-01284-4. Epub 2022 May 2.

A mixture-of-experts deep generative model for integrated analysis of single-cell multiomics data.

Cell Rep Methods. 2021 Sep 15;1(5):100071. doi: 10.1016/j.crmeth.2021.100071. eCollection 2021 Sep 27.

Integrated analysis of multimodal single-cell data.

Cell. 2021 Jun 24;184(13):3573-3587.e29. doi: 10.1016/j.cell.2021.04.048. Epub 2021 May 31.

Deep cross-omics cycle attention model for joint analysis of single-cell multi-omics data.

Bioinformatics. 2021 Nov 18;37(22):4091-4099. doi: 10.1093/bioinformatics/btab403.

Joint probabilistic modeling of single-cell multi-omic data with totalVI.

Nat Methods. 2021 Mar;18(3):272-282. doi: 10.1038/s41592-020-01050-x. Epub 2021 Feb 15.

Benchmarking Computational Doublet-Detection Methods for Single-Cell RNA Sequencing Data.

Cell Syst. 2021 Feb 17;12(2):176-194.e6. doi: 10.1016/j.cels.2020.11.008. Epub 2020 Dec 17.

SHARE-seq reveals chromatin potential.

Nat Rev Genet. 2021 Jan;22(1):2. doi: 10.1038/s41576-020-00308-6.

Single-cell RNA-seq data semi-supervised clustering and annotation via structural regularized domain adaptation.

Bioinformatics. 2021 May 5;37(6):775-784. doi: 10.1093/bioinformatics/btaa908.

Jointly defining cell types from multiple single-cell datasets using LIGER.

Nat Protoc. 2020 Nov;15(11):3632-3662. doi: 10.1038/s41596-020-0391-8. Epub 2020 Oct 12.

Unsupervised topological alignment for single-cell multi-omics integration.

Bioinformatics. 2020 Jul 1;36(Suppl_1):i48-i56. doi: 10.1093/bioinformatics/btaa443.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

使用 MoClust 对单细胞多组学数据进行聚类。

Clustering single-cell multi-omics data with MoClust.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

SUPPLEMENTARY INFORMATION

动机

结果

可用性和实现

补充信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献