Benedict Matthew N, Mundy Michael B, Henry Christopher S, Chia Nicholas, Price Nathan D
Department of Chemical and Biomolecular Engineering, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America.
Center for Individualized Medicine, Mayo Clinic, Rochester, Minnesota, United States of America.
PLoS Comput Biol. 2014 Oct 16;10(10):e1003882. doi: 10.1371/journal.pcbi.1003882. eCollection 2014 Oct.
Genome-scale metabolic models provide a powerful means to harness information from genomes to deepen biological insights. With exponentially increasing sequencing capacity, there is an enormous need for automated reconstruction techniques that can provide more accurate models in a short time frame. Current methods for automated metabolic network reconstruction rely on gene and reaction annotations to build draft metabolic networks and algorithms to fill gaps in these networks. However, automated reconstruction is hampered by database inconsistencies, incorrect annotations, and gap filling largely without considering genomic information. Here we develop an approach for applying genomic information to predict alternative functions for genes and estimate their likelihoods from sequence homology. We show that computed likelihood values were significantly higher for annotations found in manually curated metabolic networks than those that were not. We then apply these alternative functional predictions to estimate reaction likelihoods, which are used in a new gap filling approach called likelihood-based gap filling to predict more genomically consistent solutions. To validate the likelihood-based gap filling approach, we applied it to models where essential pathways were removed, finding that likelihood-based gap filling identified more biologically relevant solutions than parsimony-based gap filling approaches. We also demonstrate that models gap filled using likelihood-based gap filling provide greater coverage and genomic consistency with metabolic gene functions compared to parsimony-based approaches. Interestingly, despite these findings, we found that likelihoods did not significantly affect consistency of gap filled models with Biolog and knockout lethality data. This indicates that the phenotype data alone cannot necessarily be used to discriminate between alternative solutions for gap filling and therefore, that the use of other information is necessary to obtain a more accurate network. All described workflows are implemented as part of the DOE Systems Biology Knowledgebase (KBase) and are publicly available via API or command-line web interface.
基因组规模代谢模型提供了一种强大的手段,可利用基因组信息来深化生物学见解。随着测序能力呈指数级增长,迫切需要能够在短时间内提供更准确模型的自动化重建技术。当前用于自动化代谢网络重建的方法依赖于基因和反应注释来构建代谢网络草案,并依赖算法来填补这些网络中的空白。然而,自动化重建受到数据库不一致、注释错误以及在很大程度上未考虑基因组信息的空白填补的阻碍。在此,我们开发了一种方法,用于应用基因组信息来预测基因的替代功能,并根据序列同源性估计其可能性。我们表明,在人工策划的代谢网络中发现的注释的计算可能性值显著高于未发现的注释。然后,我们应用这些替代功能预测来估计反应可能性,这些可能性用于一种称为基于可能性的空白填补的新空白填补方法中,以预测更多与基因组一致的解决方案。为了验证基于可能性的空白填补方法,我们将其应用于去除了基本途径的模型,发现基于可能性的空白填补比基于简约性的空白填补方法识别出更多生物学相关的解决方案。我们还证明,与基于简约性的方法相比,使用基于可能性的空白填补进行空白填补的模型在代谢基因功能方面具有更大的覆盖范围和基因组一致性。有趣的是,尽管有这些发现,但我们发现可能性对空白填补模型与Biolog和基因敲除致死性数据的一致性没有显著影响。这表明仅靠表型数据不一定能用于区分空白填补的替代解决方案,因此,需要使用其他信息来获得更准确的网络。所有描述的工作流程都作为美国能源部系统生物学知识库(KBase)的一部分实现,并可通过API或命令行网络界面公开获取。