Suppr超能文献

OutPredict:多个数据集可提高表达预测和因果推断。

OutPredict: multiple datasets can improve prediction of expression and inference of causality.

机构信息

Courant Institute of Mathematical Sciences, Department of Computer Science, New York University, New York, NY, 10012, USA.

Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY, 10003, USA.

出版信息

Sci Rep. 2020 Apr 22;10(1):6804. doi: 10.1038/s41598-020-63347-3.

Abstract

The ability to accurately predict the causal relationships from transcription factors to genes would greatly enhance our understanding of transcriptional dynamics. This could lead to applications in which one or more transcription factors could be manipulated to effect a change in genes leading to the enhancement of some desired trait. Here we present a method called OutPredict that constructs a model for each gene based on time series (and other) data and that predicts gene's expression in a previously unseen subsequent time point. The model also infers causal relationships based on the most important transcription factors for each gene model, some of which have been validated from previous physical experiments. The method benefits from known network edges and steady-state data to enhance predictive accuracy. Our results across B. subtilis, Arabidopsis, E.coli, Drosophila and the DREAM4 simulated in silico dataset show improved predictive accuracy ranging from 40% to 60% over other state-of-the-art methods. We find that gene expression models can benefit from the addition of steady-state data to predict expression values of time series. Finally, we validate, based on limited available data, that the influential edges we infer correspond to known relationships significantly more than expected by chance or by state-of-the-art methods.

摘要

准确预测转录因子与基因之间因果关系的能力将极大地增强我们对转录动态的理解。这可能导致可以操纵一个或多个转录因子以改变基因,从而增强某些所需性状的应用。在这里,我们提出了一种称为 OutPredict 的方法,该方法为每个基因基于时间序列(和其他)数据构建模型,并预测以前未见过的后续时间点的基因表达。该模型还根据每个基因模型的最重要转录因子推断因果关系,其中一些已从先前的物理实验中得到验证。该方法受益于已知的网络边缘和稳态数据以提高预测准确性。我们在枯草芽孢杆菌、拟南芥、大肠杆菌、果蝇和 DREAM4 模拟的数据集上的结果表明,与其他最先进的方法相比,预测精度提高了 40%到 60%。我们发现,基因表达模型可以从添加稳态数据中受益,以预测时间序列的表达值。最后,我们根据有限的可用数据验证了我们推断的有影响力的边缘比预期的偶然或最先进的方法更符合已知关系。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ac8c/7176633/caff6bf740cb/41598_2020_63347_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验