Suppr超能文献

跨物种单细胞RNA测序数据整合方法的基准测试:迈向细胞类型生命树

Benchmarking cross-species single-cell RNA-seq data integration methods: towards a cell type tree of life.

作者信息

Zhong Huawen, Han Wenkai, Gomez-Cabrero David, Tegner Jesper, Gao Xin, Cui Guoxin, Aranda Manuel

机构信息

BioEngineering Program, Biological and Environmental Science and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia.

Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia.

出版信息

Nucleic Acids Res. 2025 Jan 7;53(1). doi: 10.1093/nar/gkae1316.

Abstract

Cross-species single-cell RNA-seq data hold immense potential for unraveling cell type evolution and transferring knowledge between well-explored and less-studied species. However, challenges arise from interspecific genetic variation, batch effects stemming from experimental discrepancies and inherent individual biological differences. Here, we benchmarked nine data-integration methods across 20 species, encompassing 4.7 million cells, spanning eight phyla and the entire animal taxonomic hierarchy. Our evaluation reveals notable differences between the methods in removing batch effects and preserving biological variance across taxonomic distances. Methods that effectively leverage gene sequence information capture underlying biological variances, while generative model-based approaches excel in batch effect removal. SATURN demonstrates robust performance across diverse taxonomic levels, from cross-genus to cross-phylum, emphasizing its versatility. SAMap excels in integrating species beyond the cross-family level, especially for atlas-level cross-species integration, while scGen shines within or below the cross-class hierarchy. As a result, our analysis offers recommendations and guidelines for selecting suitable integration methods, enhancing cross-species single-cell RNA-seq analyses and advancing algorithm development.

摘要

跨物种单细胞RNA测序数据在揭示细胞类型进化以及在研究充分和研究较少的物种之间传递知识方面具有巨大潜力。然而,种间遗传变异、实验差异导致的批次效应以及固有的个体生物学差异带来了挑战。在这里,我们对跨越20个物种的9种数据整合方法进行了基准测试,涵盖470万个细胞,跨越8个门以及整个动物分类层级。我们的评估揭示了这些方法在消除批次效应和在分类距离上保留生物学差异方面的显著差异。有效利用基因序列信息的方法能够捕捉潜在的生物学差异,而基于生成模型的方法在消除批次效应方面表现出色。SATURN在从跨属到跨门的不同分类水平上都表现出稳健的性能,凸显了其通用性。SAMap在整合跨科以上物种方面表现出色,特别是对于图谱级别的跨物种整合,而scGen在跨类层级内部或以下表现突出。因此,我们的分析为选择合适的整合方法提供了建议和指导方针,以增强跨物种单细胞RNA测序分析并推动算法开发。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0d4c/11707536/3e045a0ac758/gkae1316figgra1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验