Suppr超能文献

将单细胞数据与生物学变量相结合。

Integrating single-cell data with biological variables.

作者信息

Zhou Yang, Sheng Qiongyu, Jin Shuilin

机构信息

School of Mathematics, Harbin Institute of Technology, Harbin 150001, China.

Zhengzhou Research Institute, Harbin Institute of Technology, Zhengzhou 450000, China.

出版信息

Proc Natl Acad Sci U S A. 2025 May 6;122(18):e2416516122. doi: 10.1073/pnas.2416516122. Epub 2025 Apr 28.

Abstract

Constructing single-cell atlases requires preserving differences attributable to biological variables, such as cell types, tissue origins, and disease states, while eliminating batch effects. However, existing methods are inadequate in explicitly modeling these biological variables. Here, we introduce SIGNAL, a general framework that leverages biological variables to disentangle biological and technical effects, thereby linking these metadata to data integration. SIGNAL employs a variant of principal component analysis to align multiple batches, enabling the integration of 1 million cells in approximately 2 min. SIGNAL, despite its computational simplicity, surpasses state-of-the-art methods across multiple integration scenarios: 1) heterogeneous datasets, 2) cross-species datasets, 3) simulated datasets, 4) integration on low-quality cell annotations, and 5) reference-based integration. Furthermore, we demonstrate that SIGNAL accurately transfers knowledge from reference to query datasets. Notably, we propose a self-adjustment strategy to restore annotated cell labels potentially distorted during integration. Finally, we apply SIGNAL to multiple large-scale atlases, including a human heart cell atlas containing 2.7 million cells, identifying tissue- and developmental stage-specific subtypes, as well as condition-specific cell states. This underscores SIGNAL's exceptional capability in multiscale analysis.

摘要

构建单细胞图谱需要保留由生物变量(如细胞类型、组织来源和疾病状态)导致的差异,同时消除批次效应。然而,现有方法在明确建模这些生物变量方面存在不足。在此,我们引入了SIGNAL,这是一个通用框架,它利用生物变量来区分生物效应和技术效应,从而将这些元数据与数据整合联系起来。SIGNAL采用主成分分析的一种变体来对齐多个批次,能够在大约2分钟内整合100万个细胞。SIGNAL尽管计算简单,但在多种整合场景中超越了当前的先进方法:1)异质数据集,2)跨物种数据集,3)模拟数据集,4)基于低质量细胞注释的整合,以及5)基于参考的整合。此外,我们证明SIGNAL能够准确地将知识从参考数据集转移到查询数据集。值得注意的是,我们提出了一种自我调整策略,以恢复在整合过程中可能被扭曲的注释细胞标签。最后,我们将SIGNAL应用于多个大规模图谱,包括一个包含270万个细胞的人类心脏细胞图谱,识别出组织和发育阶段特异性的亚型以及特定条件下的细胞状态。这突出了SIGNAL在多尺度分析方面的卓越能力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ea50/12067276/455d782b2e8a/pnas.2416516122fig01.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验