Suppr超能文献

使用 MoClust 对单细胞多组学数据进行聚类。

Clustering single-cell multi-omics data with MoClust.

机构信息

Center for Quantitative Biology, Peking University, Beijing 100871, China.

Huawei Technologies Co., Ltd., Beijing 100080, China.

出版信息

Bioinformatics. 2023 Jan 1;39(1). doi: 10.1093/bioinformatics/btac736.

Abstract

MOTIVATION

Single-cell multi-omics sequencing techniques have rapidly developed in the past few years. Clustering analysis with single-cell multi-omics data may give us novel perspectives to dissect cellular heterogeneity. However, multi-omics data have the properties of inherited large dimension, high sparsity and existence of doublets. Moreover, representations of different omics from even the same cell follow diverse distributions. Without proper distribution alignment techniques, clustering methods will encounter less separable clusters easily affected by less informative omics data.

RESULTS

We developed MoClust, a novel joint clustering framework that can be applied to several types of single-cell multi-omics data. A selective automatic doublet detection module that can identify and filter out doublets is introduced in the pretraining stage to improve data quality. Omics-specific autoencoders are introduced to characterize the multi-omics data. A contrastive learning way of distribution alignment is adopted to adaptively fuse omics representations into an omics-invariant representation. This novel way of alignment boosts the compactness and separableness of clusters, while accurately weighting the contribution of each omics to the clustering object. Extensive experiments, over both simulated and real multi-omics datasets, demonstrated the powerful alignment, doublet detection and clustering ability features of MoClust.

AVAILABILITY AND IMPLEMENTATION

An implementation of MoClust is available from https://doi.org/10.5281/zenodo.7306504.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

单细胞多组学测序技术在过去几年中迅速发展。对单细胞多组学数据进行聚类分析可能会为我们剖析细胞异质性提供新的视角。然而,多组学数据具有固有维度大、高度稀疏和存在二聚体的特性。此外,即使来自同一细胞的不同组学的表示形式也遵循不同的分布。如果没有适当的分布对齐技术,聚类方法将很容易遇到聚类不易分离的情况,并且容易受到信息量较少的组学数据的影响。

结果

我们开发了 MoClust,这是一种新的联合聚类框架,可应用于几种类型的单细胞多组学数据。在预处理阶段引入了一种选择性的自动二聚体检测模块,可以识别和过滤二聚体,以提高数据质量。引入了特定于组学的自动编码器来描述多组学数据。采用对比学习的分布对齐方式,自适应地将组学表示融合到组学不变表示中。这种新颖的对齐方式提高了聚类的紧凑性和可分离性,同时准确地加权每个组学对聚类对象的贡献。通过对模拟和真实多组学数据集进行广泛的实验,证明了 MoClust 的强大对齐、二聚体检测和聚类能力。

可用性和实现

MoClust 的实现可从 https://doi.org/10.5281/zenodo.7306504 获得。

补充信息

补充数据可在 Bioinformatics 在线获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1ba3/9805570/29e1056cad32/btac736f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验