Department of Information Technology, Indian Institute of Information Technology Allahabad, Allahabad, India.
Department of Pharmacoinformatics, National Institute of Pharmaceutical Education and Research (NIPER), Mohali, India.
J Biomol Struct Dyn. 2024;42(20):10727-10738. doi: 10.1080/07391102.2023.2258415. Epub 2023 Sep 18.
Determining the structure-odor relationship has always been a very challenging task. The main challenge in investigating the correlation between the molecular structure and its associated odor is the ambiguous and obscure nature of verbally defined odor descriptors, particularly when the odorant molecules are from different sources. With the recent developments in machine learning (ML) technology, ML and data analytic techniques are significantly being used for quantitative structure-activity relationship (QSAR) in the chemistry domain toward knowledge discovery where the traditional Edisonian methods have not been useful. The smell perception of odorant molecules is one of the aforementioned tasks, as olfaction is one of the least understood senses as compared to other senses. In this study, the XGBoost odor prediction model was generated to classify smells of odorant molecules from their SMILES strings. We first collected the dataset of 1278 odorant molecules with seven basic odor descriptors, and then 1875 physicochemical properties of odorant molecules were calculated. To obtain relevant physicochemical features, a feature reduction algorithm called PCA was also employed. The ML model developed in this study was able to predict all seven basic smells with high precision (>99%) and high sensitivity (>99%) when tested on an independent test dataset. The results of the proposed study were also compared with three recently conducted studies. The results indicate that the XGBoost-PCA model performed better than the other models for predicting common odor descriptors. The methodology and ML model developed in this study may be helpful in understanding the structure-odor relationship.Communicated by Ramaswamy H. Sarma.
确定结构-气味关系一直是一项极具挑战性的任务。在研究分子结构与其相关气味之间的相关性时,主要的挑战是口头定义的气味描述符的模糊和不明确性质,特别是当气味分子来自不同的来源时。随着机器学习(ML)技术的最新发展,ML 和数据分析技术在化学领域的定量构效关系(QSAR)中得到了广泛应用,用于知识发现,而传统的爱迪生方法在此并不适用。气味分子的嗅觉感知就是上述任务之一,因为与其他感觉相比,嗅觉是最不被理解的感觉之一。在这项研究中,生成了 XGBoost 气味预测模型,以根据 SMILES 字符串对气味分子进行分类。我们首先收集了包含七种基本气味描述符的 1278 种气味分子的数据集,然后计算了 1875 种气味分子的物理化学性质。为了获得相关的物理化学特征,还使用了一种称为 PCA 的特征降维算法。当在独立测试数据集上进行测试时,所开发的 ML 模型能够以高精度(>99%)和高灵敏度(>99%)预测所有七种基本气味。该研究的结果还与最近进行的三项研究进行了比较。结果表明,XGBoost-PCA 模型在预测常见气味描述符方面优于其他模型。本研究中开发的方法和 ML 模型可能有助于理解结构-气味关系。由 Ramaswamy H. Sarma 交流。