Cheminformatics and Metabolism, European Molecular Biology Laboratory - European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.
BMC Bioinformatics. 2014 Jul 5;15:234. doi: 10.1186/1471-2105-15-234.
In metabolomics experiments, spectral fingerprints of metabolites with no known structural identity are detected routinely. Computer-assisted structure elucidation (CASE) has been used to determine the structural identities of unknown compounds. It is generally accepted that a single 1D NMR spectrum or mass spectrum is usually not sufficient to establish the identity of a hitherto unknown compound. When a suite of spectra from 1D and 2D NMR experiments supplemented with a molecular formula are available, the successful elucidation of the chemical structure for candidates with up to 30 heavy atoms has been reported previously by one of the authors. In high-throughput metabolomics, usually 1D NMR or mass spectrometry experiments alone are conducted for rapid analysis of samples. This method subsequently requires that the spectral patterns are analyzed automatically to quickly identify known and unknown structures. In this study, we investigated whether additional existing knowledge, such as the fact that the unknown compound is a natural product, can be used to improve the ranking of the correct structure in the result list after the structure elucidation process.
To identify unknowns using as little spectroscopic information as possible, we implemented an evolutionary algorithm-based CASE mechanism to elucidate candidates in a fully automated fashion, with input of the molecular formula and 13C NMR spectrum of the isolated compound. We also tested how filters like natural product-likeness, a measure that calculates the similarity of the compounds to known natural product space, might enhance the performance and quality of the structure elucidation. The evolutionary algorithm is implemented within the SENECA package for CASE reported previously, and is available for free download under artistic license at http://sourceforge.net/projects/seneca/. The natural product-likeness calculator is incorporated as a plugin within SENECA and is available as a GUI client and command-line executable. Significant improvements in candidate ranking were demonstrated for 41 small test molecules when the CASE system was supplemented by a natural product-likeness filter.
In spectroscopically underdetermined structure elucidation problems, natural product-likeness can contribute to a better ranking of the correct structure in the results list.
在代谢组学实验中,经常会检测到具有未知结构特征的代谢物的光谱指纹。计算机辅助结构解析(CASE)已被用于确定未知化合物的结构特征。人们普遍认为,单一的一维 NMR 谱或质谱通常不足以确定一个未知化合物的身份。当一套来自一维和二维 NMR 实验的谱图以及分子式可用时,作者之一曾报道过成功解析了多达 30 个重原子候选物的化学结构。在高通量代谢组学中,通常仅进行一维 NMR 或质谱实验,以便对样品进行快速分析。这种方法随后需要对光谱模式进行自动分析,以快速识别已知和未知结构。在这项研究中,我们研究了是否可以利用其他现有知识,例如未知化合物是天然产物的事实,来提高结构解析后结果列表中正确结构的排名。
为了尽可能少地使用光谱信息来识别未知物,我们实现了一种基于进化算法的 CASE 机制,以全自动方式对候选物进行解析,输入的是分离化合物的分子式和 13C NMR 谱。我们还测试了像天然产物相似性这样的过滤器如何提高结构解析的性能和质量,这是一种计算化合物与已知天然产物空间相似性的度量。进化算法是在之前报道的 SENECA 包中的 CASE 中实现的,可在 http://sourceforge.net/projects/seneca/ 下以艺术许可证免费下载。天然产物相似性计算器作为 SENECA 的一个插件集成在一起,作为 GUI 客户端和命令行可执行文件提供。当 CASE 系统补充天然产物相似性过滤器时,对 41 个小分子测试分子的候选物排名有显著提高。
在光谱上未充分确定的结构解析问题中,天然产物相似性可以有助于更好地在结果列表中对正确结构进行排名。