Wang Junchao, Sun Ling, Wei Nana, Huang Yisheng, Zhang Naiqian
School of Mathematics and Statistics, Shandong University at Weihai, Weihai, Shandong 264209, China.
Key Laboratory of Carcinogenesis and Translational Research (Ministry of Education/Beijing), Department of Lymphoma, Peking University Cancer Hospital & Institute, Beijing 100142, China.
Bioinformatics. 2025 Sep 1;41(9). doi: 10.1093/bioinformatics/btaf454.
Trajectory inference methods are essential for extracting temporal ordering from static single-cell transcriptomic profiles, thus facilitating the accurate delineation of cellular developmental hierarchies and cell-fate transitions. However, numerous existing methods treat trajectory inference as an unsupervised learning task, rendering them susceptible to technical noise and data sparsity, which often lead to unstable reconstructions and ambiguous lineage assignments.
Here, we introduce BayesTraj, a semi-supervised Bayesian framework that incorporates prior knowledge of lineage topology and marker-gene expression to robustly reconstruct differentiation trajectories from scRNA-seq data. BayesTraj models cellular differentiation as a probabilistic mixture of latent lineages and captures marker-gene dynamics through parametric functions. Posterior inference is conducted using Hamiltonian Monte Carlo (HMC), yielding estimates of pseudotime, lineage proportions, and gene activation parameters. Evaluations on both simulated and real datasets with diverse branching structures demonstrate that BayesTraj consistently outperforms state-of-the-art methods in pseudotime inference. In addition, it provides per-cell branch-assignment probabilities, enabling the quantification of differentiation potential using Shannon entropy and the detection of lineage-specific gene expression via Bayesian model comparison.
BayesTraj is written in R and available at https://github.com/SDU-W-Zhanglab/BayesTraj and has been archived on Zenodo (DOI: 10.5281/zenodo.16758038).
轨迹推断方法对于从静态单细胞转录组图谱中提取时间顺序至关重要,从而有助于准确描绘细胞发育层次和细胞命运转变。然而,许多现有方法将轨迹推断视为无监督学习任务,使其容易受到技术噪声和数据稀疏性的影响,这往往导致不稳定的重建和模糊的谱系分配。
在此,我们介绍了BayesTraj,这是一种半监督贝叶斯框架,它结合了谱系拓扑和标记基因表达的先验知识,以从scRNA-seq数据中稳健地重建分化轨迹。BayesTraj将细胞分化建模为潜在谱系的概率混合,并通过参数函数捕获标记基因动态。使用哈密顿蒙特卡罗(HMC)进行后验推断,得出伪时间、谱系比例和基因激活参数的估计值。对具有不同分支结构的模拟和真实数据集的评估表明,BayesTraj在伪时间推断方面始终优于现有方法。此外,它提供每个细胞的分支分配概率,能够使用香农熵量化分化潜力,并通过贝叶斯模型比较检测谱系特异性基因表达。
BayesTraj用R编写,可在https://github.com/SDU-W-Zhanglab/BayesTraj获得,并已存档于Zenodo(DOI:10.5281/zenodo.16758038)。