Novo Nordisk Foundation Center for Biosustainability at the University of California, San Diego, School of Medicine, La Jolla, California, United States of America.
Department of Pediatrics, University of California, San Diego, School of Medicine, La Jolla, California, United States of America.
PLoS Comput Biol. 2019 Jul 19;15(7):e1007185. doi: 10.1371/journal.pcbi.1007185. eCollection 2019 Jul.
To gain insights into complex biological processes, genome-scale data (e.g., RNA-Seq) are often overlaid on biochemical networks. However, many networks do not have a one-to-one relationship between genes and network edges, due to the existence of isozymes and protein complexes. Therefore, decisions must be made on how to overlay data onto networks. For example, for metabolic networks, these decisions include (1) how to integrate gene expression levels using gene-protein-reaction rules, (2) the approach used for selection of thresholds on expression data to consider the associated gene as "active", and (3) the order in which these steps are imposed. However, the influence of these decisions has not been systematically tested. We compared 20 decision combinations using a transcriptomic dataset across 32 tissues and showed that definition of which reaction may be considered as active (i.e., reactions of the genome-scale metabolic network with a non-zero expression level after overlaying the data) is mainly influenced by thresholding approach used. To determine the most appropriate decisions, we evaluated how these decisions impact the acquisition of tissue-specific active reaction lists that recapitulate organ-system tissue groups. These results will provide guidelines to improve data analyses with biochemical networks and facilitate the construction of context-specific metabolic models.
为了深入了解复杂的生物过程,通常会将基因组规模的数据(例如 RNA-Seq)叠加在生化网络上。然而,由于同工酶和蛋白质复合物的存在,许多网络中基因与网络边缘之间并非一一对应。因此,必须就如何将数据叠加到网络上做出决策。例如,对于代谢网络,这些决策包括:(1)如何使用基因-蛋白-反应规则整合基因表达水平;(2)用于选择表达数据阈值的方法,以将相关基因视为“活跃”;(3)这些步骤施加的顺序。然而,这些决策的影响尚未得到系统测试。我们使用跨越 32 种组织的转录组数据集比较了 20 种决策组合,并表明定义哪些反应可以被认为是活跃的(即在叠加数据后基因组规模代谢网络的具有非零表达水平的反应)主要受到所使用的阈值方法的影响。为了确定最合适的决策,我们评估了这些决策如何影响获取 recapitulate 器官系统组织群的组织特异性活性反应列表。这些结果将为改进生物化学网络的数据分析提供指导,并有助于构建特定于上下文的代谢模型。