Wang Yulin, Miao Hongyu
School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, Sichuan, China.
Department of Biostatistics, School of Public Health, University of Texas Health Science Center at Houston, Houston, TX, 77030, USA.
BMC Syst Biol. 2017 May 4;11(1):53. doi: 10.1186/s12918-017-0432-2.
To systematically understand the interactions between numerous biological components, a variety of biological networks on different levels and scales have been constructed and made available in public databases or knowledge repositories. Graphical models such as structural equation models have long been used to describe biological networks for various quantitative analysis tasks, especially key biological parameter estimation. However, limited by resources or technical capacities, partial observation is a common problem in experimental observations of biological networks, and it thus becomes an important problem how to select unobserved nodes for additional measurements such that all unknown model parameters become identifiable. To the best knowledge of our authors, a solution to this problem does not exist until this study.
The identifiability-based observation problem for biological networks is mathematically formulated for the first time based on linear recursive structural equation models, and then a dynamic programming strategy is developed to obtain the optimal observation strategies. The efficiency of the dynamic programming algorithm is achieved by avoiding both symbolic computation and matrix operations as used in other studies. We also provided necessary theoretical justifications to the proposed method. Finally, we verified the algorithm using synthetic network structures and illustrated the application of the proposed method in practice using a real biological network related to influenza A virus infection.
The proposed approach is the first solution to the structural identifiability-based optimal observation remedy problem. It is applicable to an arbitrary directed acyclic biological network (recursive SEMs) without bidirectional edges, and it is a computerizable method. Observation remedy is an important issue in experiment design for biological networks, and we believe that this study provides a solid basis for dealing with more challenging design issues (e.g., feedback loops, dynamic or nonlinear networks) in the future. We implemented our method in R, which is freely accessible at https://github.com/Hongyu-Miao/SIOOR .
为了系统地理解众多生物成分之间的相互作用,已经构建了不同层次和规模的各种生物网络,并在公共数据库或知识库中提供。诸如结构方程模型之类的图形模型长期以来一直用于描述生物网络以进行各种定量分析任务,尤其是关键生物参数估计。然而,受资源或技术能力的限制,部分观测是生物网络实验观测中的常见问题,因此如何选择未观测节点进行额外测量以使所有未知模型参数可识别成为一个重要问题。据我们作者所知,在本研究之前不存在该问题的解决方案。
基于线性递归结构方程模型首次对生物网络基于可识别性的观测问题进行了数学公式化,然后开发了一种动态规划策略以获得最优观测策略。通过避免其他研究中使用的符号计算和矩阵运算实现了动态规划算法的效率。我们还为所提出的方法提供了必要的理论依据。最后,我们使用合成网络结构验证了算法,并使用与甲型流感病毒感染相关的真实生物网络说明了所提出方法在实际中的应用。
所提出的方法是基于结构可识别性的最优观测补救问题的首个解决方案。它适用于没有双向边的任意有向无环生物网络(递归结构方程模型),并且是一种可计算机化的方法。观测补救是生物网络实验设计中的一个重要问题,我们相信本研究为未来处理更具挑战性的设计问题(例如反馈回路、动态或非线性网络)提供了坚实的基础。我们在R语言中实现了我们的方法,可在https://github.com/Hongyu-Miao/SIOOR上免费获取。