College of Life Sciences, University of Chinese Academy of Sciences, Beijing, 100049, China.
BGI Research, Shenzhen, 518083, China.
Commun Biol. 2024 Nov 16;7(1):1523. doi: 10.1038/s42003-024-07238-7.
Pathway analysis is a crucial analytical phase in disease research on single-cell RNA sequencing (scRNA-seq) data, offering biological interpretations based on prior knowledge. However, currently available tools for generating cell-level pathway activity scores (PAS) exhibit computational inefficacy in large-scale scRNA-seq datasets. Additionally, disease-related pathways are often identified through cross-condition comparisons within specific cell types, overlooking potential patterns that involve multiple cell types. Here, we present single-cell pathway activity factor analysis (scPAFA), a Python library designed for large-scale single-cell datasets allowing rapid PAS computation and uncovering biologically interpretable disease-related multicellular pathway modules, which are low-dimensional representations of disease-related PAS alterations in multiple cell types. Application on colorectal cancer (CRC) datasets and large-scale lupus atlas over 1.2 million cells demonstrated that scPAFA can achieve over 40-fold reductions in the runtime of PAS computation and further identified reliable and interpretable multicellular pathway modules that capture the heterogeneity of CRC and transcriptional abnormalities in lupus patients, respectively. Overall, scPAFA presents a valuable addition to existing research tools in disease research, with the potential to reveal complex disease mechanisms and support biomarker discovery at the pathway level.
通路分析是单细胞 RNA 测序 (scRNA-seq) 数据疾病研究中的关键分析阶段,可基于先验知识提供生物学解释。然而,目前用于生成细胞水平通路活性评分 (PAS) 的工具在大规模 scRNA-seq 数据集中表现出计算效率低下。此外,疾病相关通路通常是通过特定细胞类型内的跨条件比较来识别的,而忽略了涉及多种细胞类型的潜在模式。在这里,我们提出了单细胞通路活性因子分析 (scPAFA),这是一个专为大规模单细胞数据集设计的 Python 库,允许快速计算 PAS,并揭示生物学上可解释的疾病相关多细胞通路模块,这是多种细胞类型中疾病相关 PAS 改变的低维表示。在结直肠癌 (CRC) 数据集和超过 120 万个细胞的大规模狼疮图谱上的应用表明,scPAFA 可以将 PAS 计算的运行时间减少 40 多倍,并进一步确定可靠和可解释的多细胞通路模块,分别捕获 CRC 的异质性和狼疮患者的转录异常。总的来说,scPAFA 为疾病研究中的现有研究工具提供了有价值的补充,有可能揭示复杂的疾病机制,并支持在通路水平上发现生物标志物。