Department of Computer Sciences, University of Wisconsin, Madison, WI, USA.
Department of Biostatistics and Medical Informatics, University of Wisconsin, Madison, WI, USA.
PLoS Comput Biol. 2019 Jun 27;15(6):e1006758. doi: 10.1371/journal.pcbi.1006758. eCollection 2019 Jun.
Many biological studies involve either (i) manipulating some aspect of a cell or its environment and then simultaneously measuring the effect on thousands of genes, or (ii) systematically manipulating each gene and then measuring the effect on some response of interest. A common challenge that arises in these studies is to explain how genes identified as relevant in the given experiment are organized into a subnetwork that accounts for the response of interest. The task of inferring a subnetwork is typically dependent on the information available in publicly available, structured databases, which suffer from incompleteness. However, a wealth of potentially relevant information resides in the scientific literature, such as information about genes associated with certain concepts of interest, as well as interactions that occur among various biological entities. We contend that by exploiting this information, we can improve the explanatory power and accuracy of subnetwork inference in multiple applications. Here we propose and investigate several ways in which information extracted from the scientific literature can be used to augment subnetwork inference. We show that we can use literature-extracted information to (i) augment the set of entities identified as being relevant in a subnetwork inference task, (ii) augment the set of interactions used in the process, and (iii) support targeted browsing of a large inferred subnetwork by identifying entities and interactions that are closely related to concepts of interest. We use this approach to uncover the pathways involved in interactions between a virus and a host cell, and the pathways that are regulated by a transcription factor associated with breast cancer. Our experimental results demonstrate that these approaches can provide more accurate and more interpretable subnetworks. Integer program code, background network data, and pathfinding code are available at https://github.com/Craven-Biostat-Lab/subnetwork_inference.
(i)操纵细胞或其环境的某些方面,然后同时测量对数千个基因的影响,或(ii)系统地操纵每个基因,然后测量对感兴趣的某些反应的影响。在这些研究中,一个常见的挑战是解释在给定实验中被确定为相关的基因如何组织成一个子网络,以解释感兴趣的反应。推断子网络的任务通常取决于在公共可用的结构化数据库中可用的信息,但这些数据库存在不完整性的问题。然而,大量潜在的相关信息存在于科学文献中,例如与某些感兴趣的概念相关的基因的信息,以及各种生物实体之间发生的相互作用。我们认为,通过利用这些信息,我们可以提高子网络推断在多个应用中的解释能力和准确性。在这里,我们提出并研究了几种从科学文献中提取信息的方法,可以用来增强子网络推断。我们表明,我们可以使用文献提取的信息来:(i)扩充在子网络推断任务中被确定为相关的实体集,(ii)扩充用于该过程的相互作用集,以及(iii)通过识别与感兴趣的概念密切相关的实体和相互作用,支持对大型推断出的子网络的有针对性浏览。我们使用这种方法来揭示病毒与宿主细胞之间相互作用所涉及的途径,以及与乳腺癌相关的转录因子所调节的途径。我们的实验结果表明,这些方法可以提供更准确和更具可解释性的子网络。整数规划代码、背景网络数据和路径查找代码可在 https://github.com/Craven-Biostat-Lab/subnetwork_inference 上获得。