跨物种单细胞RNA测序数据整合方法的基准测试：迈向细胞类型生命树

Benchmarking cross-species single-cell RNA-seq data integration methods: towards a cell type tree of life.

作者信息

Zhong Huawen, Han Wenkai, Gomez-Cabrero David, Tegner Jesper, Gao Xin, Cui Guoxin, Aranda Manuel

机构信息

BioEngineering Program, Biological and Environmental Science and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia.

Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia.

出版信息

Nucleic Acids Res. 2025 Jan 7;53(1). doi: 10.1093/nar/gkae1316.

DOI:10.1093/nar/gkae1316

PMID:39778870

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11707536/

Abstract

Cross-species single-cell RNA-seq data hold immense potential for unraveling cell type evolution and transferring knowledge between well-explored and less-studied species. However, challenges arise from interspecific genetic variation, batch effects stemming from experimental discrepancies and inherent individual biological differences. Here, we benchmarked nine data-integration methods across 20 species, encompassing 4.7 million cells, spanning eight phyla and the entire animal taxonomic hierarchy. Our evaluation reveals notable differences between the methods in removing batch effects and preserving biological variance across taxonomic distances. Methods that effectively leverage gene sequence information capture underlying biological variances, while generative model-based approaches excel in batch effect removal. SATURN demonstrates robust performance across diverse taxonomic levels, from cross-genus to cross-phylum, emphasizing its versatility. SAMap excels in integrating species beyond the cross-family level, especially for atlas-level cross-species integration, while scGen shines within or below the cross-class hierarchy. As a result, our analysis offers recommendations and guidelines for selecting suitable integration methods, enhancing cross-species single-cell RNA-seq analyses and advancing algorithm development.

摘要

跨物种单细胞RNA测序数据在揭示细胞类型进化以及在研究充分和研究较少的物种之间传递知识方面具有巨大潜力。然而，种间遗传变异、实验差异导致的批次效应以及固有的个体生物学差异带来了挑战。在这里，我们对跨越20个物种的9种数据整合方法进行了基准测试，涵盖470万个细胞，跨越8个门以及整个动物分类层级。我们的评估揭示了这些方法在消除批次效应和在分类距离上保留生物学差异方面的显著差异。有效利用基因序列信息的方法能够捕捉潜在的生物学差异，而基于生成模型的方法在消除批次效应方面表现出色。SATURN在从跨属到跨门的不同分类水平上都表现出稳健的性能，凸显了其通用性。SAMap在整合跨科以上物种方面表现出色，特别是对于图谱级别的跨物种整合，而scGen在跨类层级内部或以下表现突出。因此，我们的分析为选择合适的整合方法提供了建议和指导方针，以增强跨物种单细胞RNA测序分析并推动算法开发。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0d4c/11707536/3e045a0ac758/gkae1316figgra1.jpg

相似文献

Benchmarking cross-species single-cell RNA-seq data integration methods: towards a cell type tree of life.跨物种单细胞RNA测序数据整合方法的基准测试：迈向细胞类型生命树

Nucleic Acids Res. 2025 Jan 7;53(1). doi: 10.1093/nar/gkae1316.

Benchmarking atlas-level data integration in single-cell genomics.单细胞基因组学中图谱级数据整合的基准测试。

Nat Methods. 2022 Jan;19(1):41-50. doi: 10.1038/s41592-021-01336-8. Epub 2021 Dec 23.

CIDER: an interpretable meta-clustering framework for single-cell RNA-seq data integration and evaluation.CIDER：一种用于单细胞 RNA-seq 数据集成和评估的可解释元聚类框架。

Genome Biol. 2021 Dec 13;22(1):337. doi: 10.1186/s13059-021-02561-2.

XgCPred: Cell type classification using XGBoost-CNN integration and exploiting gene expression imaging in single-cell RNAseq data.XgCPred：基于 XGBoost-CNN 集成和单细胞 RNAseq 数据中基因表达成像的细胞类型分类。

Comput Biol Med. 2024 Oct;181:109066. doi: 10.1016/j.compbiomed.2024.109066. Epub 2024 Aug 24.

SCIntRuler: guiding the integration of multiple single-cell RNA-seq datasets with a novel statistical metric.SCIntRuler：利用新的统计度量标准指导多个单细胞 RNA-seq 数据集的整合。

Bioinformatics. 2024 Sep 2;40(9). doi: 10.1093/bioinformatics/btae537.

Clustering scRNA-seq data with the cross-view collaborative information fusion strategy.使用跨视图协同信息融合策略对 scRNA-seq 数据进行聚类。

Brief Bioinform. 2024 Sep 23;25(6). doi: 10.1093/bib/bbae511.

Benchmarking Algorithms for Gene Set Scoring of Single-cell ATAC-seq Data.单细胞 ATAC-seq 数据基因集评分算法的基准测试。

Genomics Proteomics Bioinformatics. 2024 Jul 3;22(2). doi: 10.1093/gpbjnl/qzae014.

Single-cell analysis via manifold fitting: A framework for RNA clustering and beyond.单细胞分析通过流形拟合：RNA 聚类及其他。

Proc Natl Acad Sci U S A. 2024 Sep 10;121(37):e2400002121. doi: 10.1073/pnas.2400002121. Epub 2024 Sep 3.

A benchmark of batch-effect correction methods for single-cell RNA sequencing data.单细胞 RNA 测序数据批次效应校正方法的基准测试。

Genome Biol. 2020 Jan 16;21(1):12. doi: 10.1186/s13059-019-1850-9.

A Cell Cycle-Aware Network for Data Integration and Label Transferring of Single-Cell RNA-Seq and ATAC-Seq.一种细胞周期感知网络，用于整合单细胞 RNA-Seq 和 ATAC-Seq 数据并进行标签转移。

Adv Sci (Weinh). 2024 Aug;11(31):e2401815. doi: 10.1002/advs.202401815. Epub 2024 Jun 17.

引用本文的文献

Establishing single cell RNA transcriptomics: a brief guide.建立单细胞RNA转录组学：简要指南。

Front Zool. 2025 Sep 2;22(1):25. doi: 10.1186/s12983-025-00579-x.

The dorsal/ventral subdivision of the hindbrain predates the tunicate/vertebrate split.后脑的背侧/腹侧细分早于被囊动物/脊椎动物的分化。

bioRxiv. 2025 Jul 18:2025.07.15.664975. doi: 10.1101/2025.07.15.664975.

本文引用的文献

Integrating phylogenies into single-cell RNA sequencing analysis allows comparisons across species, genes, and cells.将系统发生关系整合到单细胞 RNA 测序分析中，允许在物种、基因和细胞之间进行比较。

PLoS Biol. 2024 May 24;22(5):e3002633. doi: 10.1371/journal.pbio.3002633. eCollection 2024 May.

The future of rapid and automated single-cell data analysis using reference mapping.基于参考映射的高通量、自动化单细胞数据分析的未来。

Cell. 2024 May 9;187(10):2343-2358. doi: 10.1016/j.cell.2024.03.009.

scGPT: toward building a foundation model for single-cell multi-omics using generative AI.scGPT：迈向使用生成式人工智能构建单细胞多组学基础模型

Nat Methods. 2024 Aug;21(8):1470-1480. doi: 10.1038/s41592-024-02201-0. Epub 2024 Feb 26.

Toward universal cell embeddings: integrating single-cell RNA-seq datasets across species with SATURN.迈向通用细胞嵌入：使用SATURN整合跨物种的单细胞RNA测序数据集。

Nat Methods. 2024 Aug;21(8):1492-1500. doi: 10.1038/s41592-024-02191-z. Epub 2024 Feb 16.

Cell type evolution reconstruction across species through cell phylogenies of single-cell RNA sequencing data.通过单细胞RNA测序数据的细胞系统发育重建跨物种的细胞类型进化

Nat Ecol Evol. 2024 Feb;8(2):325-338. doi: 10.1038/s41559-023-02281-9. Epub 2024 Jan 5.

Benchmarking strategies for cross-species integration of single-cell RNA sequencing data.用于单细胞 RNA 测序数据跨物种整合的基准测试策略。

Nat Commun. 2023 Oct 14;14(1):6495. doi: 10.1038/s41467-023-41855-w.

A Review of Single-Cell RNA-Seq Annotation, Integration, and Cell-Cell Communication.单细胞 RNA-Seq 注释、整合和细胞间通讯综述。

Cells. 2023 Jul 30;12(15):1970. doi: 10.3390/cells12151970.

Single-cell reference mapping to construct and extend cell-type hierarchies.单细胞参考图谱构建与扩展细胞类型层级结构

NAR Genom Bioinform. 2023 Jul 26;5(3):lqad070. doi: 10.1093/nargab/lqad070. eCollection 2023 Sep.

Integration of Single-Cell RNA-Seq Datasets: A Review of Computational Methods.单细胞 RNA-Seq 数据集的整合：计算方法综述。

Mol Cells. 2023 Feb 28;46(2):106-119. doi: 10.14348/molcells.2023.0009. Epub 2023 Feb 24.

Cell type diversity in a developing octopus brain.章鱼大脑发育过程中的细胞多样性。

Nat Commun. 2022 Nov 30;13(1):7392. doi: 10.1038/s41467-022-35198-1.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

跨物种单细胞RNA测序数据整合方法的基准测试：迈向细胞类型生命树

Benchmarking cross-species single-cell RNA-seq data integration methods: towards a cell type tree of life.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献