Department of Statistics, Ecology and Evolution, Molecular Genetics & Cell Biology, Institute of Genomics and Systems Biology, University of Chicago, Chicago, IL 60637, USA.
Department of Human Genetics, Ecology and Evolution, Molecular Genetics & Cell Biology, Institute of Genomics and Systems Biology, University of Chicago, Chicago, IL 60637, USA.
Bioinformatics. 2020 Jul 1;36(Suppl_1):i499-i507. doi: 10.1093/bioinformatics/btaa506.
The universal expressibility assumption of Deep Neural Networks (DNNs) is the key motivation behind recent worksin the systems biology community to employDNNs to solve important problems in functional genomics and moleculargenetics. Typically, such investigations have taken a 'black box' approach in which the internal structure of themodel used is set purely by machine learning considerations with little consideration of representing the internalstructure of the biological system by the mathematical structure of the DNN. DNNs have not yet been applied to thedetailed modeling of transcriptional control in which mRNA production is controlled by the binding of specific transcriptionfactors to DNA, in part because such models are in part formulated in terms of specific chemical equationsthat appear different in form from those used in neural networks.
In this paper, we give an example of a DNN whichcan model the detailed control of transcription in a precise and predictive manner. Its internal structure is fully interpretableand is faithful to underlying chemistry of transcription factor binding to DNA. We derive our DNN from asystems biology model that was not previously recognized as having a DNN structure. Although we apply our DNNto data from the early embryo of the fruit fly Drosophila, this system serves as a test bed for analysis of much larger datasets obtained by systems biology studies on a genomic scale. .
The implementation and data for the models used in this paper are in a zip file in the supplementary material.
Supplementary data are available at Bioinformatics online.
深度神经网络 (DNN) 的通用表达能力假设是系统生物学领域最近的一些工作背后的关键动机,这些工作利用 DNN 来解决功能基因组学和分子遗传学中的重要问题。通常,此类研究采用“黑盒”方法,即模型使用的内部结构纯粹由机器学习考虑因素确定,很少考虑通过 DNN 的数学结构来表示生物系统的内部结构。DNN 尚未应用于转录控制的详细建模,其中 mRNA 的产生受到特定转录因子与 DNA 结合的控制,部分原因是此类模型部分是根据特定的化学反应式制定的,这些化学反应式在形式上与神经网络中使用的化学反应式不同。
在本文中,我们给出了一个 DNN 的示例,该 DNN 可以以精确和可预测的方式对转录的详细控制进行建模。它的内部结构是完全可解释的,并且忠实于转录因子与 DNA 结合的基础化学。我们从以前未被认为具有 DNN 结构的系统生物学模型中推导出我们的 DNN。尽管我们将我们的 DNN 应用于果蝇早期胚胎的数据,但该系统可作为在基因组规模上进行系统生物学研究获得的更大数据集的分析的测试平台。
本文中使用的模型的实现和数据都在补充材料的 zip 文件中。
补充数据可在 Bioinformatics 在线获取。