Venkat Aarthi, Leone Sam, Youlten Scott E, Fagerberg Eric, Attanasio John, Joshi Nikhil S, Perlmutter Michael, Krishnaswamy Smita
Computational Biology and Bioinformatics Program, Yale University, New Haven, CT, USA.
Applied Math Program, Yale University, New Haven, CT, USA.
Nat Comput Sci. 2024 Dec;4(12):955-977. doi: 10.1038/s43588-024-00734-0. Epub 2024 Dec 20.
In single-cell sequencing analysis, several computational methods have been developed to map the cellular state space, but little has been done to map or create embeddings of the gene space. Here we formulate the gene embedding problem, design tasks with simulated single-cell data to evaluate representations, and establish ten relevant baselines. We then present a graph signal processing approach, called gene signal pattern analysis (GSPA), that learns rich gene representations from single-cell data using a dictionary of diffusion wavelets on the cell-cell graph. GSPA enables characterization of genes based on their patterning and localization on the cellular manifold. We motivate and demonstrate the efficacy of GSPA as a framework for diverse biological tasks, such as capturing gene co-expression modules, condition-specific enrichment and perturbation-specific gene-gene interactions. Then we showcase the broad utility of gene representations derived from GSPA, including for cell-cell communication (GSPA-LR), spatial transcriptomics (GSPA-multimodal) and patient response (GSPA-Pt) analysis.
在单细胞测序分析中,已经开发了几种计算方法来映射细胞状态空间,但在映射或创建基因空间的嵌入方面做得很少。在这里,我们阐述了基因嵌入问题,设计了带有模拟单细胞数据的任务来评估表示,并建立了十个相关基线。然后,我们提出了一种图信号处理方法,称为基因信号模式分析(GSPA),它使用细胞-细胞图上的扩散小波字典从单细胞数据中学习丰富的基因表示。GSPA能够基于基因在细胞流形上的模式和定位对其进行表征。我们激发并证明了GSPA作为一个框架在各种生物学任务中的有效性,例如捕获基因共表达模块、条件特异性富集和扰动特异性基因-基因相互作用。然后,我们展示了从GSPA衍生的基因表示的广泛用途,包括用于细胞-细胞通信(GSPA-LR)、空间转录组学(GSPA-多模态)和患者反应(GSPA-Pt)分析。