Cao Wenjie, Zhang Bengong, Zhou Tianshou
School of Mathematics, Sun Yat-sen University, Guangzhou 510275, China.
School of Mathematics & Statistics, Wuhan Textile University, Wuhan 430200, China.
Bioinform Adv. 2025 Aug 5;5(1):vbaf185. doi: 10.1093/bioadv/vbaf185. eCollection 2025.
Embryonic cells finally evolve into various types of mature cells, where cell fate determinations play pivotal roles, but dynamic features of this process remain elusive.
We analyze four single-cell RNA sequencing datasets on mouse embryo cells, mouse embryonic fibroblasts, human bone marrow, and intestine organoid. We show that key (high expression) genes of each organism exhibit different statistical features and expression patterns before and after branch, e.g. for mouse embryo cells, the mRNA distribution of gene Gata3 is bimodal before branch, unimodal at branching point and trimodal for one branch but bimodal for the other branch. Moreover, there is a distribution mode such that it is the same before and after branch, and this fact would account for maintenance of the genetic information in a complex cell evolving process. Machine learning reveal that along the cell pseudo-time trajectory, the strength that one key gene regulates another is fundamentally increasing before branch but is always monotonically increasing after branch; burst size and frequency of key genes are always monotonically decreasing before branch but monotonically increasing for one branch and monotonically decreasing for another branch. Our results unveil the essential features of dynamic cell processes and can be taken as a supplement for accurately screening marker genes of cell fate determination on basis of the existed methods.
The implementation of CFD is available at https://github.com/cellwj/CFD and the preprocessed data is available at https://zenodo.org/records/14367638.Cell fate determination, single-cell RNA sequencing data, marker gene, cell process, developmental branch.
胚胎细胞最终会演变成各种类型的成熟细胞,其中细胞命运的决定起着关键作用,但这一过程的动态特征仍不清楚。
我们分析了四个关于小鼠胚胎细胞、小鼠胚胎成纤维细胞、人类骨髓和肠道类器官的单细胞RNA测序数据集。我们表明,每个生物体的关键(高表达)基因在分支前后表现出不同的统计特征和表达模式,例如,对于小鼠胚胎细胞,基因Gata3的mRNA分布在分支前是双峰的,在分支点是单峰的,一个分支是三峰的,而另一个分支是双峰的。此外,存在一种分布模式,使得分支前后是相同的,这一事实可以解释在复杂的细胞进化过程中遗传信息的维持。机器学习表明,沿着细胞伪时间轨迹,一个关键基因调控另一个基因的强度在分支前从根本上增加,但在分支后总是单调增加;关键基因的爆发大小和频率在分支前总是单调减少,但一个分支单调增加,另一个分支单调减少。我们的结果揭示了动态细胞过程的基本特征,可以作为对现有方法准确筛选细胞命运决定标记基因的补充。