Hu Zhirui, Przytycki Pawel F, Pollard Katherine S
Gladstone Institute of Data Science & Biotechnology, 1650 Owens Street, San Francisco, CA 94158, USA.
Gladstone Institute of Data Science & Biotechnology, 1650 Owens Street, San Francisco, CA 94158, USA; Faculty of Computing & Data Sciences, Boston University, 665 Commonwealth Avenue, Boston, MA 02215, USA.
Cell Genom. 2025 Jul 9;5(7):100886. doi: 10.1016/j.xgen.2025.100886. Epub 2025 May 22.
Tissues are composed of cells with a wide range of similarities to each other, yet existing methods for single-cell genomics treat cell types as discrete labels. To address this gap, we developed CellWalker2, a graph diffusion-based model for the annotation and mapping of multi-modal data. With our open-source software package, hierarchically related cell types can be probabilistically matched across contexts and used to annotate cells, genomic regions, or gene sets. Additional features include estimating statistical significance and enabling gene expression and chromatin accessibility to be jointly modeled. Through simulation studies, we show that CellWalker2 performs better than existing methods in cell-type annotation and mapping. We then use multi-omics data from the brain and immune system to demonstrate CellWalker2's ability to assign high-resolution cell-type labels to regulatory elements and TFs and to quantify both conserved and divergent cell-type relationships between species.
组织由彼此具有广泛相似性的细胞组成,但现有的单细胞基因组学方法将细胞类型视为离散标签。为了弥补这一差距,我们开发了CellWalker2,这是一种基于图扩散的多模态数据注释和映射模型。通过我们的开源软件包,可以在不同背景下概率性地匹配层次相关的细胞类型,并用于注释细胞、基因组区域或基因集。其他功能包括估计统计显著性以及能够对基因表达和染色质可及性进行联合建模。通过模拟研究,我们表明CellWalker2在细胞类型注释和映射方面比现有方法表现更好。然后,我们使用来自大脑和免疫系统的多组学数据来证明CellWalker2能够为调控元件和转录因子分配高分辨率细胞类型标签,并量化物种之间保守和不同的细胞类型关系。