Cao Lan, Zhang Wenhao, Yang Fan, Chen Shengquan, Huang Xiaobing, Zeng Feng, Wang Ying
Department of Automation, Xiamen University, Xiang'an South Road, Xiang'an, 361102, Xiamen, Fujian, China.
National Institute for Data Science in Health and Medicine, Xiamen University, Xiang'an South Road, Xiang'an, 361102, Xiamen, Fujian, China.
Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbaf013.
Understanding cell destiny requires unraveling the intricate mechanism of gene regulation, where transcription factors (TFs) play a pivotal role. However, the actual contribution of TFs, that is TF activity, is not only determined by TF expression, but also accessibility of corresponding chromatin regions. Therefore, we introduce BIOTIC, an advanced Bayesian model with a well-established gene regulation structure that harnesses the power of single-cell multi-omics data to model the gene expression process under the control of regulatory elements, thereby defining the regulatory activity of TFs with variational inference. We demonstrated that the TF activity inferred by BIOTIC can serve as a characterization of cell identity, and outperforms baseline methods for the tasks of cell typing, cell development tracking, and batch effect correction. Additionally, BIOTIC trained on multi-omics data can flexibly be applied to the scenario where merely single-cell transcriptome sequencing is available, to infer TF activity and annotate the cell type by mapping the query cell into the reference TF activity space, as an emerging application of cell atlases. The structure of BIOTIC has been determined to be adaptable for the inclusion of additional biological factors, allowing for flexible and more comprehensive gene regulation analysis. BIOTIC introduces a pioneering biological-mechanism-driven framework to infer TF activity and elucidate cell identity states at gene regulatory level, paving the way for a deeper understanding of the complex interplay between TFs and gene expression in living systems.
了解细胞命运需要揭示复杂的基因调控机制,其中转录因子(TFs)起着关键作用。然而,转录因子的实际作用,即TF活性,不仅取决于TF的表达,还取决于相应染色质区域的可及性。因此,我们引入了BIOTIC,这是一种先进的贝叶斯模型,具有成熟的基因调控结构,它利用单细胞多组学数据的力量来模拟在调控元件控制下的基因表达过程,从而通过变分推理定义转录因子的调控活性。我们证明,由BIOTIC推断出的TF活性可以作为细胞身份的一种表征,并且在细胞类型分类、细胞发育追踪和批次效应校正等任务上优于基线方法。此外,在多组学数据上训练的BIOTIC可以灵活地应用于仅可获得单细胞转录组测序的情况,通过将查询细胞映射到参考TF活性空间来推断TF活性并注释细胞类型,这是细胞图谱的一种新兴应用。已确定BIOTIC的结构适用于纳入其他生物学因素,从而实现灵活且更全面的基因调控分析。BIOTIC引入了一个开创性的生物机制驱动框架,用于推断TF活性并在基因调控水平上阐明细胞身份状态,为深入理解活系统中转录因子与基因表达之间的复杂相互作用铺平了道路。