Department of Medicine, University of California San Diego, La Jolla, California, USA.
Program in Bioinformatics, University of California San Diego, La Jolla, California, USA.
Nat Methods. 2018 Apr;15(4):290-298. doi: 10.1038/nmeth.4627. Epub 2018 Mar 5.
Although artificial neural networks are powerful classifiers, their internal structures are hard to interpret. In the life sciences, extensive knowledge of cell biology provides an opportunity to design visible neural networks (VNNs) that couple the model's inner workings to those of real systems. Here we develop DCell, a VNN embedded in the hierarchical structure of 2,526 subsystems comprising a eukaryotic cell (http://d-cell.ucsd.edu/). Trained on several million genotypes, DCell simulates cellular growth nearly as accurately as laboratory observations. During simulation, genotypes induce patterns of subsystem activities, enabling in silico investigations of the molecular mechanisms underlying genotype-phenotype associations. These mechanisms can be validated, and many are unexpected; some are governed by Boolean logic. Cumulatively, 80% of the importance for growth prediction is captured by 484 subsystems (21%), reflecting the emergence of a complex phenotype. DCell provides a foundation for decoding the genetics of disease, drug resistance and synthetic life.
尽管人工神经网络是强大的分类器,但它们的内部结构难以解释。在生命科学中,对细胞生物学的广泛了解为设计可见神经网络 (VNN) 提供了机会,该网络将模型的内部工作与真实系统的内部工作联系起来。在这里,我们开发了 DCell,这是一个嵌入在由真核细胞(http://d-cell.ucsd.edu/)组成的 2526 个子系统层次结构中的 VNN。经过几百万种基因型的训练,DCell 可以近乎准确地模拟细胞生长。在模拟过程中,基因型会诱导子系统活动模式,从而能够在计算机上研究基因型与表型关联的分子机制。这些机制可以得到验证,其中许多是出乎意料的;有些受布尔逻辑控制。总体而言,对于生长预测的重要性有 80% 可以用 484 个子系统(21%)来解释,这反映出复杂表型的出现。DCell 为解码疾病、药物抗性和合成生命的遗传学提供了基础。