Kotera Masaaki, McDonald Andrew G, Boyce Sinéad, Tipton Keith F
School of Biochemistry and Immunology, Trinity College, Dublin 2, Ireland.
J Chem Inf Model. 2008 Dec;48(12):2335-49. doi: 10.1021/ci800213g.
The development of metabolomics has resulted in the discovery of an increasing number of orphan metabolites, which are defined as compounds that are known to be present in living organisms but whose synthetic/degradation pathways are unknown. In this paper, we describe a procedure for identifying possible products and/or precursors of such orphan metabolites and for suggesting complete reaction equations and the corresponding EC (Enzyme Commission) number simultaneously. Chemical structure comparison is performed for a pair of compounds consisting of a reported substrate and its corresponding product and also for pairs of randomly selected compounds. Possible combinations of compounds registered in the KEGG database were used for generating putative enzyme reaction equations, which resulted in 77% of the reported equations being generated, as most of the remainder represent classes of compounds, rather than specific compounds, or contain Markush structures. The quality was checked using chemical structure comparison and the random-tree method, which gave 98% accuracy in suggesting EC subsubclasses for reported equations in cross-validation tests. The equations generated in this study can be seen using the Web-based program GREP (Generator of Reaction Equations & Pathways; http://bisscat.org/GREP/ ). The usefulness of our method for constructing possible metabolic pathways was demonstrated by mapping the generated equations for several groups of compounds, such as the betalain alkaloids. The possible development of our method so that alternative substrates for reported enzymes can be found and for annotating enzyme functions in genomic research is also discussed.
代谢组学的发展促使越来越多的孤儿代谢物被发现,这些代谢物被定义为已知存在于生物体中但其合成/降解途径未知的化合物。在本文中,我们描述了一种程序,用于识别此类孤儿代谢物的可能产物和/或前体,并同时提出完整的反应方程式和相应的酶委员会(EC)编号。对由报道的底物及其相应产物组成的一对化合物以及随机选择的化合物对进行化学结构比较。利用KEGG数据库中注册的化合物的可能组合来生成假定的酶反应方程式,结果生成了77%的报道方程式,因为其余大部分代表化合物类别而非特定化合物,或包含马库什结构。使用化学结构比较和随机树方法检查质量,在交叉验证测试中,该方法为报道的方程式建议EC亚亚类的准确率达到98%。本研究中生成的方程式可通过基于网络的程序GREP(反应方程式与途径生成器;http://bisscat.org/GREP/ )查看。通过绘制几组化合物(如甜菜碱生物碱)的生成方程式,证明了我们构建可能代谢途径的方法的实用性。还讨论了我们方法可能的发展方向,以便在基因组研究中找到报道酶的替代底物并注释酶的功能。