Biosciences and Informatics, Keio University, 3-14-1 Hiyoshi, Yokohama 223-8522, Japan.
BMC Bioinformatics. 2012;13 Suppl 17(Suppl 17):S8. doi: 10.1186/1471-2105-13-S17-S8. Epub 2012 Dec 13.
Prediction of biochemical (metabolic) pathways has a wide range of applications, including the optimization of drug candidates, and the elucidation of toxicity mechanisms. Recently, several methods have been developed for pathway prediction to derive a goal compound from a start compound. However, these methods require high computational costs, and cannot perform comprehensive prediction of novel metabolic pathways. Our aim of this study is to develop a de novo prediction method for reconstructions of metabolic pathways and predictions of unknown biosynthetic pathways in the sense that it does not require any initial network such as KEGG metabolic network to be explored.
We formulated pathway prediction between a start compound and a goal compound as the shortest path search problem in terms of the number of enzyme reactions applied. We propose an efficient search method based on A* algorithm and heuristic techniques utilizing Linear Programming (LP) solution for estimation of the distance to the goal. First, a chemical compound is represented by a feature vector which counts frequencies of substructure occurrences in the structural formula. Second, an enzyme reaction is represented as an operator vector by detecting the structural changes to compounds before and after the reaction. By defining compound vectors as nodes and operator vectors as edges, prediction of the reaction pathway is reduced to the shortest path search problem in the vector space. In experiments on the DDT degradation pathway, we verify that the shortest paths predicted by our method are biologically correct pathways registered in the KEGG database. The results also demonstrate that the LP heuristics can achieve significant reduction in computation time. Furthermore, we apply our method to a secondary metabolite pathway of plant origin, and successfully find a novel biochemical pathway which cannot be predicted by the existing method. For the reconstruction of a known biochemical pathway, our method is over 40 times as fast as the existing method.
Our method enables fast and accurate de novo pathway predictions and novel pathway detection.
生化(代谢)途径的预测具有广泛的应用,包括优化候选药物和阐明毒性机制。最近,已经开发了几种方法来进行途径预测,以便从起始化合物推导目标化合物。然而,这些方法需要高的计算成本,并且不能对新的代谢途径进行全面预测。我们的研究目的是开发一种从头开始的方法来重建代谢途径,并预测未知的生物合成途径,因为它不需要探索任何初始网络,如 KEGG 代谢网络。
我们将起始化合物和目标化合物之间的途径预测表述为酶反应数量的最短路径搜索问题。我们提出了一种基于 A*算法和启发式技术的高效搜索方法,利用线性规划(LP)解决方案来估计到目标的距离。首先,将化学化合物表示为特征向量,该向量计算结构公式中结构出现的频率。其次,通过检测反应前后化合物的结构变化,将酶反应表示为算子向量。通过将化合物向量定义为节点,将算子向量定义为边,预测反应途径就简化为向量空间中的最短路径搜索问题。在 DDT 降解途径的实验中,我们验证了我们的方法预测的最短路径是在 KEGG 数据库中注册的生物正确途径。结果还表明,LP 启发式可以显著减少计算时间。此外,我们将我们的方法应用于植物来源的次生代谢途径,并成功地找到了一种无法通过现有方法预测的新的生化途径。对于已知生化途径的重建,我们的方法比现有方法快 40 多倍。
我们的方法能够实现快速准确的从头开始途径预测和新途径检测。