通过 cFIT 实现单细胞转录组的整合和迁移学习。

Integration and transfer learning of single-cell transcriptomes via cFIT.

机构信息

Department of Statistics and Data Science, Carnegie Mellon University, Pittsburgh, PA 15213.

Neurogenetics Program, University of California, Los Angeles, CA 90095.

出版信息

Proc Natl Acad Sci U S A. 2021 Mar 9;118(10). doi: 10.1073/pnas.2024383118.

DOI:10.1073/pnas.2024383118

PMID:33658382

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7958425/

Abstract

Large, comprehensive collections of single-cell RNA sequencing (scRNA-seq) datasets have been generated that allow for the full transcriptional characterization of cell types across a wide variety of biological and clinical conditions. As new methods arise to measure distinct cellular modalities, a key analytical challenge is to integrate these datasets or transfer knowledge from one to the other to better understand cellular identity and functions. Here, we present a simple yet surprisingly effective method named common factor integration and transfer learning (cFIT) for capturing various batch effects across experiments, technologies, subjects, and even species. The proposed method models the shared information between various datasets by a common factor space while allowing for unique distortions and shifts in genewise expression in each batch. The model parameters are learned under an iterative nonnegative matrix factorization (NMF) framework and then used for synchronized integration from across-domain assays. In addition, the model enables transferring via low-rank matrix from more informative data to allow for precise identification in data of lower quality. Compared with existing approaches, our method imposes weaker assumptions on the cell composition of each individual dataset; however, it is shown to be more reliable in preserving biological variations. We apply cFIT to multiple scRNA-seq datasets of developing brain from human and mouse, varying by technologies and developmental stages. The successful integration and transfer uncover the transcriptional resemblance across systems. The study helps establish a comprehensive landscape of brain cell-type diversity and provides insights into brain development.

摘要

大量全面的单细胞 RNA 测序 (scRNA-seq) 数据集已经生成，这些数据集允许对各种生物学和临床条件下的细胞类型进行全面的转录特征描述。随着新的方法出现来测量不同的细胞模态，一个关键的分析挑战是整合这些数据集或将知识从一个数据集转移到另一个数据集，以更好地理解细胞的身份和功能。在这里，我们提出了一种简单但非常有效的方法，称为通用因子集成和迁移学习 (cFIT)，用于捕获实验、技术、对象甚至物种之间的各种批次效应。该方法通过共同因子空间来模拟各个数据集之间的共享信息，同时允许每个批次中基因表达的独特扭曲和偏移。模型参数在迭代非负矩阵分解 (NMF) 框架下进行学习，然后用于跨域分析进行同步集成。此外，该模型还可以通过从信息量更大的数据进行低秩矩阵传递，以允许在质量较低的数据中进行精确识别。与现有方法相比，我们的方法对每个单独数据集的细胞组成的假设较弱；然而，它在保留生物变化方面被证明更可靠。我们将 cFIT 应用于来自人类和小鼠的不同技术和发育阶段的发育中大脑的多个 scRNA-seq 数据集。成功的整合和转移揭示了系统之间的转录相似性。该研究有助于建立大脑细胞类型多样性的综合图景，并为大脑发育提供了新的见解。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6753/7958425/490bdf8a427c/pnas.2024383118fig01.jpg

相似文献

Integration and transfer learning of single-cell transcriptomes via cFIT.通过 cFIT 实现单细胞转录组的整合和迁移学习。

Proc Natl Acad Sci U S A. 2021 Mar 9;118(10). doi: 10.1073/pnas.2024383118.

Decomposing Cell Identity for Transfer Learning across Cellular Measurements, Platforms, Tissues, and Species.跨细胞测量、平台、组织和物种进行迁移学习的细胞身份分解。

Cell Syst. 2019 May 22;8(5):395-411.e8. doi: 10.1016/j.cels.2019.04.004.

Comprehensive Integration of Single-Cell Data.单细胞数据的综合整合。

Cell. 2019 Jun 13;177(7):1888-1902.e21. doi: 10.1016/j.cell.2019.05.031. Epub 2019 Jun 6.

UINMF performs mosaic integration of single-cell multi-omic datasets using nonnegative matrix factorization.UINMF 通过非负矩阵分解对单细胞多组学数据集进行镶嵌式整合。

Nat Commun. 2022 Feb 9;13(1):780. doi: 10.1038/s41467-022-28431-4.

One Cell At a Time (OCAT): a unified framework to integrate and analyze single-cell RNA-seq data.逐个细胞分析（OCAT）：一个集成和分析单细胞 RNA-seq 数据的统一框架。

Genome Biol. 2022 Apr 20;23(1):102. doi: 10.1186/s13059-022-02659-1.

Using transfer learning from prior reference knowledge to improve the clustering of single-cell RNA-Seq data.利用先验参考知识的迁移学习来改进单细胞 RNA-Seq 数据的聚类。

Sci Rep. 2019 Dec 30;9(1):20353. doi: 10.1038/s41598-019-56911-z.

scJoint integrates atlas-scale single-cell RNA-seq and ATAC-seq data with transfer learning.scJoint 集成了图谱尺度单细胞 RNA-seq 和 ATAC-seq 数据，并结合了迁移学习。

Nat Biotechnol. 2022 May;40(5):703-710. doi: 10.1038/s41587-021-01161-6. Epub 2022 Jan 20.

Transfer learning for clustering single-cell RNA-seq data crossing-species and batch, case on uterine fibroids.跨物种和批次的单细胞 RNA-seq 数据聚类的迁移学习：以子宫肌瘤为例。

Brief Bioinform. 2023 Nov 22;25(1). doi: 10.1093/bib/bbad426.

Analysis of single-cell RNA sequencing data based on autoencoders.基于自动编码器的单细胞 RNA 测序数据分析。

BMC Bioinformatics. 2021 Jun 8;22(1):309. doi: 10.1186/s12859-021-04150-3.

coupleCoC+: An information-theoretic co-clustering-based transfer learning framework for the integrative analysis of single-cell genomic data.coupleCoC+：一种基于信息论的共聚类转移学习框架，用于单细胞基因组数据的综合分析。

PLoS Comput Biol. 2021 Jun 2;17(6):e1009064. doi: 10.1371/journal.pcbi.1009064. eCollection 2021 Jun.

引用本文的文献

Microbiome data integration via shared dictionary learning.通过共享字典学习进行微生物组数据整合。

Nat Commun. 2025 Sep 1;16(1):8147. doi: 10.1038/s41467-025-63425-y.

MOTL: enhancing multi-omics matrix factorization with transfer learning.MOTL：通过迁移学习增强多组学矩阵分解

Genome Biol. 2025 Jul 25;26(1):224. doi: 10.1186/s13059-025-03675-7.

TransST: Transfer Learning Embedded Spatial Factor Modeling of Spatial Transcriptomics Data.TransST：空间转录组学数据的迁移学习嵌入空间因子建模

ArXiv. 2025 Apr 15:arXiv:2504.12353v1.

Integrating single-cell data with biological variables.将单细胞数据与生物学变量相结合。

Proc Natl Acad Sci U S A. 2025 May 6;122(18):e2416516122. doi: 10.1073/pnas.2416516122. Epub 2025 Apr 28.

CODI: Enhancing machine learning-based molecular profiling through contextual out-of-distribution integration.CODI：通过上下文分布外集成增强基于机器学习的分子谱分析

PNAS Nexus. 2024 Oct 15;3(10):pgae449. doi: 10.1093/pnasnexus/pgae449. eCollection 2024 Oct.

Integration mapping of cardiac fibroblast single-cell transcriptomes elucidates cellular principles of fibrosis in diverse pathologies.心脏成纤维细胞单细胞转录组的整合图谱阐明了多种病理纤维化的细胞原理。

Sci Adv. 2024 Jun 21;10(25):eadk8501. doi: 10.1126/sciadv.adk8501.

Integration of Pan-Cancer Cell Line and Single-Cell Transcriptomic Profiles Enables Inference of Therapeutic Vulnerabilities in Heterogeneous Tumors.泛癌系和单细胞转录组谱的整合能够推断异质性肿瘤的治疗弱点。

Cancer Res. 2024 Jun 14;84(12):2021-2033. doi: 10.1158/0008-5472.CAN-23-3005.

Epithelial zonation along the mouse and human small intestine defines five discrete metabolic domains.沿小鼠和人小肠的上皮分区定义了五个不同的代谢域。

Nat Cell Biol. 2024 Feb;26(2):250-262. doi: 10.1038/s41556-023-01337-z. Epub 2024 Feb 6.

Systems immunology spanning tumors, lymph nodes, and periphery.系统免疫学涵盖肿瘤、淋巴结和外周组织。

Cell Rep Methods. 2023 Dec 18;3(12):100670. doi: 10.1016/j.crmeth.2023.100670. Epub 2023 Dec 11.

Comparison of gene expression in living and postmortem human brain.活体与死后人类大脑中基因表达的比较。

medRxiv. 2023 Nov 9:2023.11.08.23298172. doi: 10.1101/2023.11.08.23298172.

本文引用的文献

Benchmarking atlas-level data integration in single-cell genomics.单细胞基因组学中图谱级数据整合的基准测试。

Nat Methods. 2022 Jan;19(1):41-50. doi: 10.1038/s41592-021-01336-8. Epub 2021 Dec 23.

Supervised Adversarial Alignment of Single-Cell RNA-seq Data.监督对抗性单细胞 RNA-seq 数据对齐。

J Comput Biol. 2021 May;28(5):501-513. doi: 10.1089/cmb.2020.0439. Epub 2021 Jan 19.

SMNN: batch effect correction for single-cell RNA-seq data via supervised mutual nearest neighbor detection.SMNN：通过有监督的互最近邻检测对单细胞 RNA-seq 数据进行批次效应校正。

Brief Bioinform. 2021 May 20;22(3). doi: 10.1093/bib/bbaa097.

Large-Scale Exome Sequencing Study Implicates Both Developmental and Functional Changes in the Neurobiology of Autism.大规模外显子组测序研究表明自闭症的神经生物学既有发育性变化也有功能性变化。

Cell. 2020 Feb 6;180(3):568-584.e23. doi: 10.1016/j.cell.2019.12.036. Epub 2020 Jan 23.

A benchmark of batch-effect correction methods for single-cell RNA sequencing data.单细胞 RNA 测序数据批次效应校正方法的基准测试。

Genome Biol. 2020 Jan 16;21(1):12. doi: 10.1186/s13059-019-1850-9.

cellHarmony: cell-level matching and holistic comparison of single-cell transcriptomes.cellHarmony：单细胞转录组的细胞级匹配和整体比较。

Nucleic Acids Res. 2019 Dec 2;47(21):e138. doi: 10.1093/nar/gkz789.

Supervised classification enables rapid annotation of cell atlases.监督分类可实现细胞图谱的快速标注。

Nat Methods. 2019 Oct;16(10):983-986. doi: 10.1038/s41592-019-0535-3. Epub 2019 Sep 9.

Data denoising with transfer learning in single-cell transcriptomics.基于迁移学习的单细胞转录组学数据去噪。

Nat Methods. 2019 Sep;16(9):875-878. doi: 10.1038/s41592-019-0537-1. Epub 2019 Aug 30.

A Single-Cell Transcriptomic Atlas of Human Neocortical Development during Mid-gestation.人类皮质中期发育的单细胞转录组图谱。

Neuron. 2019 Sep 4;103(5):785-801.e8. doi: 10.1016/j.neuron.2019.06.011. Epub 2019 Jul 11.

Current best practices in single-cell RNA-seq analysis: a tutorial.单细胞 RNA 测序分析的当前最佳实践：教程。

Mol Syst Biol. 2019 Jun 19;15(6):e8746. doi: 10.15252/msb.20188746.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

通过 cFIT 实现单细胞转录组的整合和迁移学习。

Integration and transfer learning of single-cell transcriptomes via cFIT.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献