Machine Learning Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America.
Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America.
PLoS Comput Biol. 2020 Feb 18;16(2):e1007644. doi: 10.1371/journal.pcbi.1007644. eCollection 2020 Feb.
Methods for the analysis of time series single cell expression data (scRNA-Seq) either do not utilize information about transcription factors (TFs) and their targets or only study these as a post-processing step. Using such information can both, improve the accuracy of the reconstructed model and cell assignments, while at the same time provide information on how and when the process is regulated. We developed the Continuous-State Hidden Markov Models TF (CSHMM-TF) method which integrates probabilistic modeling of scRNA-Seq data with the ability to assign TFs to specific activation points in the model. TFs are assumed to influence the emission probabilities for cells assigned to later time points allowing us to identify not just the TFs controlling each path but also their order of activation. We tested CSHMM-TF on several mouse and human datasets. As we show, the method was able to identify known and novel TFs for all processes, assigned time of activation agrees with both expression information and prior knowledge and combinatorial predictions are supported by known interactions. We also show that CSHMM-TF improves upon prior methods that do not utilize TF-gene interaction.
用于分析时间序列单细胞表达数据(scRNA-Seq)的方法要么不利用转录因子(TFs)及其靶标的信息,要么仅将其作为后处理步骤进行研究。利用这些信息不仅可以提高重建模型和细胞分配的准确性,同时还可以提供有关过程如何以及何时受到调节的信息。我们开发了连续状态隐马尔可夫模型 TF(CSHMM-TF)方法,该方法将 scRNA-Seq 数据的概率建模与将 TF 分配给模型中特定激活点的能力相结合。假设 TF 会影响分配给后续时间点的细胞的发射概率,这使我们不仅能够识别控制每条路径的 TF,还能够识别其激活顺序。我们在几个小鼠和人类数据集上测试了 CSHMM-TF。正如我们所展示的,该方法能够识别所有过程的已知和新的 TF,激活时间与表达信息和先验知识一致,并且已知相互作用支持组合预测。我们还表明,CSHMM-TF 优于不利用 TF-基因相互作用的先前方法。