T3S, Inserm UMR S-1124, Université de Paris, Paris, France.
Inserm U1133, CNRS UMR 8251, Université de Paris, Paris, France.
PLoS One. 2021 May 28;16(5):e0252486. doi: 10.1371/journal.pone.0252486. eCollection 2021.
This study aims to highlight the relationships between the structure of smell compounds and their odors. For this purpose, heterogeneous data sources were screened, and 6038 odorant compounds and their known associated odors (162 odor notes) were compiled, each individual molecule being represented with a set of 1024 structural fingerprint. Several dimensional reduction techniques (PCA, MDS, t-SNE and UMAP) with two clustering methods (k-means and agglomerative hierarchical clustering AHC) were assessed based on the calculated fingerprints. The combination of UMAP with k-means and AHC methods allowed to obtain a good representativeness of odors by clusters, as well as the best visualization of the proximity of odorants on the basis of their molecular structures. The presence or absence of molecular substructures has been calculated on odorant in order to link chemical groups to odors. The results of this analysis bring out some associations for both the odor notes and the chemical structures of the molecules such as "woody" and "spicy" notes with allylic and bicyclic structures, "balsamic" notes with unsaturated rings, both "sulfurous" and "citrus" with aldehydes, alcohols, carboxylic acids, amines and sulfur compounds, and "oily", "fatty" and "fruity" characterized by esters and with long carbon chains. Overall, the use of UMAP associated to clustering is a promising method to suggest hypotheses on the odorant structure-odor relationships.
本研究旨在强调气味化合物的结构与其气味之间的关系。为此,筛选了异构数据源,并编译了 6038 种气味化合物及其已知相关气味(162 种气味描述),每个分子都用一组 1024 个结构指纹表示。基于计算出的指纹,评估了几种降维技术(PCA、MDS、t-SNE 和 UMAP)和两种聚类方法(k-means 和凝聚层次聚类 AHC)。UMAP 与 k-means 和 AHC 方法的结合,使得通过聚类可以很好地代表气味,并且可以根据分子结构很好地可视化气味的接近程度。为了将化学基团与气味联系起来,计算了气味分子中是否存在分子亚结构。该分析的结果揭示了气味描述和分子结构之间的一些关联,例如“木质”和“辛辣”与烯丙基和双环结构有关,“香脂”与不饱和环有关,“硫磺”和“柑橘”都与醛、醇、羧酸、胺和硫化合物有关,“油性”、“脂肪”和“水果味”则与酯类和长链碳有关。总的来说,使用 UMAP 与聚类相结合是一种很有前途的方法,可以提出关于气味化合物结构-气味关系的假设。