利用CONCORD揭示单细胞数据集中连贯的细胞状态图谱。

Revealing a coherent cell state landscape across single cell datasets with CONCORD.

作者信息

Zhu Qin, Jiang Zuzhi, Thomson Matt, Gartner Zev

机构信息

Department of Pharmaceutical Chemistry, University of California San Francisco; San Francisco, CA 94158, USA.

Tetrad Graduate Program, University of California San Francisco; San Francisco, CA 94158, USA.

出版信息

bioRxiv. 2025 Apr 11:2025.03.13.643146. doi: 10.1101/2025.03.13.643146.

DOI:10.1101/2025.03.13.643146

PMID:40161827

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11952503/

Abstract

Batch integration, denoising, and dimensionality reduction remain fundamental challenges in single-cell data analysis. While many machine learning tools aim to overcome these challenges by engineering model architectures, we use a different strategy, building on the insight that optimized mini-batch sampling during training can profoundly influence learning outcomes. We present CONCORD, a self-supervised learning approach that implements a unified, probabilistic data sampling scheme combining neighborhood-aware and dataset-aware sampling: the former enhancing resolution while the latter removing batch effects. Using only a minimalist one-hidden-layer neural network and contrastive learning, CONCORD achieves state-of-the-art performance without relying on deep architectures, auxiliary losses, or supervision. It generates high-resolution cell atlases that seamlessly integrate data across batches, technologies, and species, without relying on prior assumptions about data structure. The resulting latent representations are denoised, interpretable, and biologically meaningful-capturing gene co-expression programs, resolving subtle cellular states, and preserving both local geometric relationships and global topological organization. We demonstrate CONCORD's broad applicability across diverse datasets, establishing it as a general-purpose framework for learning unified, high-fidelity representations of cellular identity and dynamics.

摘要

批量整合、去噪和降维仍然是单细胞数据分析中的基本挑战。虽然许多机器学习工具旨在通过设计模型架构来克服这些挑战，但我们采用了一种不同的策略，基于这样一种见解：训练期间优化的小批量采样会深刻影响学习结果。我们提出了CONCORD，这是一种自监督学习方法，它实现了一种统一的概率数据采样方案，结合了邻域感知采样和数据集感知采样：前者提高分辨率，而后者消除批量效应。仅使用一个极简的单隐藏层神经网络和对比学习，CONCORD在不依赖深度架构、辅助损失或监督的情况下实现了领先的性能。它生成高分辨率的细胞图谱，可无缝整合跨批次、技术和物种的数据，而无需依赖关于数据结构的先验假设。由此产生的潜在表示经过去噪、可解释且具有生物学意义——捕捉基因共表达程序，解析微妙的细胞状态，并保留局部几何关系和全局拓扑组织。我们展示了CONCORD在各种数据集上的广泛适用性，将其确立为学习细胞身份和动态的统一、高保真表示的通用框架。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

利用CONCORD揭示单细胞数据集中连贯的细胞状态图谱。

Revealing a coherent cell state landscape across single cell datasets with CONCORD.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

利用CONCORD揭示单细胞数据集中连贯的细胞状态图谱。

Revealing a coherent cell state landscape across single cell datasets with CONCORD.

作者信息

机构信息

出版信息

相似文献

本文引用的文献