Jeffryes James G, Colastani Ricardo L, Elbadawi-Sidhu Mona, Kind Tobias, Niehaus Thomas D, Broadbelt Linda J, Hanson Andrew D, Fiehn Oliver, Tyo Keith E J, Henry Christopher S
Department of Chemical and Biological Engineering, Northwestern University, Evanston, IL USA ; Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL USA.
Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL USA.
J Cheminform. 2015 Aug 28;7:44. doi: 10.1186/s13321-015-0087-1. eCollection 2015.
In spite of its great promise, metabolomics has proven difficult to execute in an untargeted and generalizable manner. Liquid chromatography-mass spectrometry (LC-MS) has made it possible to gather data on thousands of cellular metabolites. However, matching metabolites to their spectral features continues to be a bottleneck, meaning that much of the collected information remains uninterpreted and that new metabolites are seldom discovered in untargeted studies. These challenges require new approaches that consider compounds beyond those available in curated biochemistry databases.
Here we present Metabolic In silico Network Expansions (MINEs), an extension of known metabolite databases to include molecules that have not been observed, but are likely to occur based on known metabolites and common biochemical reactions. We utilize an algorithm called the Biochemical Network Integrated Computational Explorer (BNICE) and expert-curated reaction rules based on the Enzyme Commission classification system to propose the novel chemical structures and reactions that comprise MINE databases. Starting from the Kyoto Encyclopedia of Genes and Genomes (KEGG) COMPOUND database, the MINE contains over 571,000 compounds, of which 93% are not present in the PubChem database. However, these MINE compounds have on average higher structural similarity to natural products than compounds from KEGG or PubChem. MINE databases were able to propose annotations for 98.6% of a set of 667 MassBank spectra, 14% more than KEGG alone and equivalent to PubChem while returning far fewer candidates per spectra than PubChem (46 vs. 1715 median candidates). Application of MINEs to LC-MS accurate mass data enabled the identity of an unknown peak to be confidently predicted.
MINE databases are freely accessible for non-commercial use via user-friendly web-tools at http://minedatabase.mcs.anl.gov and developer-friendly APIs. MINEs improve metabolomics peak identification as compared to general chemical databases whose results include irrelevant synthetic compounds. Furthermore, MINEs complement and expand on previous in silico generated compound databases that focus on human metabolism. We are actively developing the database; future versions of this resource will incorporate transformation rules for spontaneous chemical reactions and more advanced filtering and prioritization of candidate structures. Graphical abstractMINE database construction and access methods. The process of constructing a MINE database from the curated source databases is depicted on the left. The methods for accessing the database are shown on the right.
尽管代谢组学前景广阔,但事实证明,以非靶向且可推广的方式实施代谢组学颇具难度。液相色谱 - 质谱联用技术(LC - MS)使得收集数千种细胞代谢物的数据成为可能。然而,将代谢物与其光谱特征进行匹配仍是一个瓶颈,这意味着许多收集到的信息仍未得到解读,而且在非靶向研究中很少发现新的代谢物。这些挑战需要新的方法,这些方法要考虑到经整理的生物化学数据库之外的化合物。
在此,我们介绍代谢虚拟网络扩展(MINEs),这是对已知代谢物数据库的一种扩展,纳入了尚未观察到但基于已知代谢物和常见生化反应可能存在的分子。我们利用一种名为生化网络综合计算探索器(BNICE)的算法以及基于酶委员会分类系统的专家整理反应规则,来提出构成MINE数据库的新化学结构和反应。从京都基因与基因组百科全书(KEGG)化合物数据库出发,MINE包含超过571,000种化合物,其中93%在PubChem数据库中不存在。然而,这些MINE化合物与天然产物的结构相似性平均高于KEGG或PubChem中的化合物。MINE数据库能够为667个MassBank光谱中的98.6%提出注释,比仅使用KEGG多14%,与PubChem相当,同时每个光谱返回的候选物比PubChem少得多(中位数候选物分别为46个和1715个)。将MINEs应用于LC - MS精确质量数据能够可靠地预测未知峰的身份。