Suppr超能文献

在不进行数据整合的情况下,识别跨独立单细胞研究的相似群体。

Identifying similar populations across independent single cell studies without data integration.

作者信息

González-Velasco Oscar, Simon Malte, Yilmaz Rüstem, Parlato Rosanna, Weishaupt Jochen, Imbusch Charles D, Brors Benedikt

机构信息

Division Applied Bioinformatics, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany.

Division of Neurodegenerative Disorders, Department of Neurology, Medical Faculty Mannheim, Mannheim Center for Translational Neurosciences, Heidelberg University, 68167 Mannheim, Germany.

出版信息

NAR Genom Bioinform. 2025 Apr 24;7(2):lqaf042. doi: 10.1093/nargab/lqaf042. eCollection 2025 Jun.

Abstract

Supervised and unsupervised methods have emerged to address the complexity of single cell data analysis in the context of large pools of independent studies. Here, we present ClusterFoldSimilarity (CFS), a novel statistical method design to quantify the similarity between cell groups across any number of independent datasets, without the need for data correction or integration. By bypassing these processes, CFS avoids the introduction of artifacts and loss of information, offering a simple, efficient, and scalable solution. This method match groups of cells that exhibit conserved phenotypes across datasets, including different tissues and species, and in a multimodal scenario, including single-cell RNA-Seq, ATAC-Seq, single-cell proteomics, or, more broadly, data exhibiting differential abundance effects among groups of cells. Additionally, CFS performs feature selection, obtaining cross-dataset markers of the similar phenotypes observed, providing an inherent interpretability of relationships between cell populations. To showcase the effectiveness of our methodology, we generated single-nuclei RNA-Seq data from the motor cortex and spinal cord of adult mice. By using CFS, we identified three distinct sub-populations of astrocytes conserved on both tissues. CFS includes various visualization methods for the interpretation of the similarity scores and similar cell populations.

摘要

为应对大量独立研究背景下单细胞数据分析的复杂性,监督和无监督方法应运而生。在此,我们提出了聚类折叠相似性(ClusterFoldSimilarity,CFS),这是一种新颖的统计方法,旨在量化任意数量独立数据集中细胞组之间的相似性,而无需进行数据校正或整合。通过绕过这些过程,CFS避免了伪影的引入和信息的丢失,提供了一种简单、高效且可扩展的解决方案。该方法匹配跨数据集表现出保守表型的细胞组,包括不同组织和物种,以及在多模态场景下,包括单细胞RNA测序、ATAC测序、单细胞蛋白质组学,或者更广泛地说,在细胞组之间表现出差异丰度效应的数据。此外,CFS执行特征选择,获得观察到的相似表型的跨数据集标记,提供细胞群体之间关系的内在可解释性。为了展示我们方法的有效性,我们从成年小鼠的运动皮层和脊髓生成了单核RNA测序数据。通过使用CFS,我们在两种组织中鉴定出了三个不同的星形胶质细胞亚群。CFS包括各种用于解释相似性分数和相似细胞群体的可视化方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/63b8/12019640/abbf014954e4/lqaf042fig1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验