Gibbs David L, Shmulevich Ilya
Institute for Systems Biology, Seattle, Washington, United States of America.
PLoS Comput Biol. 2017 Jun 19;13(6):e1005591. doi: 10.1371/journal.pcbi.1005591. eCollection 2017 Jun.
The Influence Maximization Problem (IMP) aims to discover the set of nodes with the greatest influence on network dynamics. The problem has previously been applied in epidemiology and social network analysis. Here, we demonstrate the application to cell cycle regulatory network analysis for Saccharomyces cerevisiae. Fundamentally, gene regulation is linked to the flow of information. Therefore, our implementation of the IMP was framed as an information theoretic problem using network diffusion. Utilizing more than 26,000 regulatory edges from YeastMine, gene expression dynamics were encoded as edge weights using time lagged transfer entropy, a method for quantifying information transfer between variables. By picking a set of source nodes, a diffusion process covers a portion of the network. The size of the network cover relates to the influence of the source nodes. The set of nodes that maximizes influence is the solution to the IMP. By solving the IMP over different numbers of source nodes, an influence ranking on genes was produced. The influence ranking was compared to other metrics of network centrality. Although the top genes from each centrality ranking contained well-known cell cycle regulators, there was little agreement and no clear winner. However, it was found that influential genes tend to directly regulate or sit upstream of genes ranked by other centrality measures. The influential nodes act as critical sources of information flow, potentially having a large impact on the state of the network. Biological events that affect influential nodes and thereby affect information flow could have a strong effect on network dynamics, potentially leading to disease. Code and data can be found at: https://github.com/gibbsdavidl/miergolf.
影响力最大化问题(IMP)旨在发现对网络动态具有最大影响的节点集。该问题先前已应用于流行病学和社会网络分析。在此,我们展示其在酿酒酵母细胞周期调控网络分析中的应用。从根本上讲,基因调控与信息流相关联。因此,我们对IMP的实现被构建为一个使用网络扩散的信息论问题。利用来自YeastMine的超过26000条调控边,基因表达动态通过时间滞后转移熵被编码为边权重,时间滞后转移熵是一种量化变量间信息传递的方法。通过选择一组源节点,扩散过程覆盖网络的一部分。网络覆盖的大小与源节点的影响力相关。使影响力最大化的节点集就是IMP的解决方案。通过在不同数量的源节点上求解IMP,生成了基因的影响力排名。将该影响力排名与网络中心性的其他指标进行比较。尽管每个中心性排名的顶级基因都包含著名的细胞周期调节因子,但几乎没有一致性,也没有明显的优胜者。然而,发现有影响力的基因往往直接调控其他中心性度量所排名的基因或位于其上游。有影响力的节点充当信息流的关键来源,可能对网络状态产生重大影响。影响有影响力的节点从而影响信息流的生物学事件可能对网络动态产生强烈影响, potentially导致疾病。代码和数据可在以下网址找到:https://github.com/gibbsdavidl/miergolf 。 (原文中“potentially leading to disease”和“Code and data can be found at: https://github.com/gibbsdavidl/miergolf.”之间的内容似乎不完整,翻译时尽量按照原文呈现。)