Lobo Daniel, Hammelman Jennifer, Levin Michael
1 Department of Biological Sciences, University of Maryland , Baltimore County, Baltimore, Maryland.
2 Center for Regenerative and Developmental Biology, and Department of Biology, Tufts University , Medford, Massachusetts.
J Comput Biol. 2016 Apr;23(4):291-7. doi: 10.1089/cmb.2015.0211. Epub 2016 Mar 7.
Automated methods for the reverse-engineering of complex regulatory networks are paving the way for the inference of mechanistic comprehensive models directly from experimental data. These novel methods can infer not only the relations and parameters of the known molecules defined in their input datasets, but also unknown components and pathways identified as necessary by the automated algorithms. Identifying the molecular nature of these unknown components is a crucial step for making testable predictions and experimentally validating the models, yet no specific and efficient tools exist to aid in this process. To this end, we present here MoCha (Molecular Characterization), a tool optimized for the search of unknown proteins and their pathways from a given set of known interacting proteins. MoCha uses the comprehensive dataset of protein-protein interactions provided by the STRING database, which currently includes more than a billion interactions from over 2,000 organisms. MoCha is highly optimized, performing typical searches within seconds. We demonstrate the use of MoCha with the characterization of unknown components from reverse-engineered models from the literature. MoCha is useful for working on network models by hand or as a downstream step of a model inference engine workflow and represents a valuable and efficient tool for the characterization of unknown pathways using known data from thousands of organisms. MoCha and its source code are freely available online under the GPLv3 license.
用于复杂调控网络逆向工程的自动化方法正在为直接从实验数据推断机械综合模型铺平道路。这些新颖的方法不仅可以推断输入数据集中定义的已知分子的关系和参数,还可以推断自动化算法确定为必要的未知成分和途径。确定这些未知成分的分子性质是做出可测试预测和通过实验验证模型的关键步骤,但目前尚无特定且有效的工具来辅助这一过程。为此,我们在此展示MoCha(分子特征分析),这是一种经过优化的工具,用于从给定的一组已知相互作用蛋白中搜索未知蛋白及其途径。MoCha使用STRING数据库提供的蛋白质-蛋白质相互作用综合数据集,该数据库目前包含来自2000多种生物的超过10亿种相互作用。MoCha经过高度优化,能在数秒内完成典型搜索。我们通过对文献中逆向工程模型的未知成分进行特征分析来展示MoCha的使用。MoCha对于手动处理网络模型或作为模型推理引擎工作流程的下游步骤很有用,并且是使用来自数千种生物的已知数据来表征未知途径的有价值且高效的工具。MoCha及其源代码可在GPLv3许可下在线免费获取。