TNO Research Group Quality and Safety, Zeist, The Netherlands.
PLoS One. 2011;6(12):e28966. doi: 10.1371/journal.pone.0028966. Epub 2011 Dec 14.
While the entirety of 'Chemical Space' is huge (and assumed to contain between 10(63) and 10(200) 'small molecules'), distinct subsets of this space can nonetheless be defined according to certain structural parameters. An example of such a subspace is the chemical space spanned by endogenous metabolites, defined as 'naturally occurring' products of an organisms' metabolism. In order to understand this part of chemical space in more detail, we analyzed the chemical space populated by human metabolites in two ways. Firstly, in order to understand metabolite space better, we performed Principal Component Analysis (PCA), hierarchical clustering and scaffold analysis of metabolites and non-metabolites in order to analyze which chemical features are characteristic for both classes of compounds. Here we found that heteroatom (both oxygen and nitrogen) content, as well as the presence of particular ring systems was able to distinguish both groups of compounds. Secondly, we established which molecular descriptors and classifiers are capable of distinguishing metabolites from non-metabolites, by assigning a 'metabolite-likeness' score. It was found that the combination of MDL Public Keys and Random Forest exhibited best overall classification performance with an AUC value of 99.13%, a specificity of 99.84% and a selectivity of 88.79%. This performance is slightly better than previous classifiers; and interestingly we found that drugs occupy two distinct areas of metabolite-likeness, the one being more 'synthetic' and the other being more 'metabolite-like'. Also, on a truly prospective dataset of 457 compounds, 95.84% correct classification was achieved. Overall, we are confident that we contributed to the tasks of classifying metabolites, as well as to understanding metabolite chemical space better. This knowledge can now be used in the development of new drugs that need to resemble metabolites, and in our work particularly for assessing the metabolite-likeness of candidate molecules during metabolite identification in the metabolomics field.
虽然“化学空间”的整体范围非常大(据估计包含 10(63)到 10(200)个“小分子”),但根据某些结构参数,仍然可以定义这个空间的不同子集。这个空间的一个子空间是内源性代谢物所占据的化学空间,定义为生物体代谢的“天然”产物。为了更详细地了解这部分化学空间,我们以两种方式分析了人类代谢物所占据的化学空间。首先,为了更好地了解代谢物空间,我们对代谢物和非代谢物进行了主成分分析(PCA)、层次聚类和支架分析,以分析哪些化学特征是这两类化合物所共有的。在这里,我们发现杂原子(氧和氮)含量以及特定环系统的存在能够区分这两类化合物。其次,我们通过分配“代谢物相似性”评分来确定哪些分子描述符和分类器能够区分代谢物和非代谢物。结果发现,MDL 公钥和随机森林的组合表现出最佳的整体分类性能,AUC 值为 99.13%,特异性为 99.84%,选择性为 88.79%。这种性能略优于以前的分类器;有趣的是,我们发现药物占据了代谢物相似性的两个不同区域,一个区域更“合成”,另一个区域更“代谢物样”。此外,在一个真正的 457 个化合物前瞻性数据集上,实现了 95.84%的正确分类。总的来说,我们有信心我们为分类代谢物以及更好地理解代谢物化学空间做出了贡献。现在,这些知识可以用于开发需要类似于代谢物的新药,特别是在代谢组学领域中用于评估候选分子在代谢物鉴定过程中的代谢物相似性。