Courant Institute of Mathematical Sciences, Department of Computer Science, New York University, New York, NY, 10012, USA.
Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY, 10003, USA.
Sci Rep. 2020 Apr 22;10(1):6804. doi: 10.1038/s41598-020-63347-3.
The ability to accurately predict the causal relationships from transcription factors to genes would greatly enhance our understanding of transcriptional dynamics. This could lead to applications in which one or more transcription factors could be manipulated to effect a change in genes leading to the enhancement of some desired trait. Here we present a method called OutPredict that constructs a model for each gene based on time series (and other) data and that predicts gene's expression in a previously unseen subsequent time point. The model also infers causal relationships based on the most important transcription factors for each gene model, some of which have been validated from previous physical experiments. The method benefits from known network edges and steady-state data to enhance predictive accuracy. Our results across B. subtilis, Arabidopsis, E.coli, Drosophila and the DREAM4 simulated in silico dataset show improved predictive accuracy ranging from 40% to 60% over other state-of-the-art methods. We find that gene expression models can benefit from the addition of steady-state data to predict expression values of time series. Finally, we validate, based on limited available data, that the influential edges we infer correspond to known relationships significantly more than expected by chance or by state-of-the-art methods.
准确预测转录因子与基因之间因果关系的能力将极大地增强我们对转录动态的理解。这可能导致可以操纵一个或多个转录因子以改变基因,从而增强某些所需性状的应用。在这里,我们提出了一种称为 OutPredict 的方法,该方法为每个基因基于时间序列(和其他)数据构建模型,并预测以前未见过的后续时间点的基因表达。该模型还根据每个基因模型的最重要转录因子推断因果关系,其中一些已从先前的物理实验中得到验证。该方法受益于已知的网络边缘和稳态数据以提高预测准确性。我们在枯草芽孢杆菌、拟南芥、大肠杆菌、果蝇和 DREAM4 模拟的数据集上的结果表明,与其他最先进的方法相比,预测精度提高了 40%到 60%。我们发现,基因表达模型可以从添加稳态数据中受益,以预测时间序列的表达值。最后,我们根据有限的可用数据验证了我们推断的有影响力的边缘比预期的偶然或最先进的方法更符合已知关系。