XGBoost 气味预测模型：利用极端梯度提升算法寻找气味分子的结构-气味关系。

XGBoost odor prediction model: finding the structure-odor relationship of odorant molecules using the extreme gradient boosting algorithm.

机构信息

Department of Information Technology, Indian Institute of Information Technology Allahabad, Allahabad, India.

Department of Pharmacoinformatics, National Institute of Pharmaceutical Education and Research (NIPER), Mohali, India.

出版信息

J Biomol Struct Dyn. 2024;42(20):10727-10738. doi: 10.1080/07391102.2023.2258415. Epub 2023 Sep 18.

DOI:10.1080/07391102.2023.2258415

PMID:37723894

Abstract

Determining the structure-odor relationship has always been a very challenging task. The main challenge in investigating the correlation between the molecular structure and its associated odor is the ambiguous and obscure nature of verbally defined odor descriptors, particularly when the odorant molecules are from different sources. With the recent developments in machine learning (ML) technology, ML and data analytic techniques are significantly being used for quantitative structure-activity relationship (QSAR) in the chemistry domain toward knowledge discovery where the traditional Edisonian methods have not been useful. The smell perception of odorant molecules is one of the aforementioned tasks, as olfaction is one of the least understood senses as compared to other senses. In this study, the XGBoost odor prediction model was generated to classify smells of odorant molecules from their SMILES strings. We first collected the dataset of 1278 odorant molecules with seven basic odor descriptors, and then 1875 physicochemical properties of odorant molecules were calculated. To obtain relevant physicochemical features, a feature reduction algorithm called PCA was also employed. The ML model developed in this study was able to predict all seven basic smells with high precision (>99%) and high sensitivity (>99%) when tested on an independent test dataset. The results of the proposed study were also compared with three recently conducted studies. The results indicate that the XGBoost-PCA model performed better than the other models for predicting common odor descriptors. The methodology and ML model developed in this study may be helpful in understanding the structure-odor relationship.Communicated by Ramaswamy H. Sarma.

摘要

确定结构-气味关系一直是一项极具挑战性的任务。在研究分子结构与其相关气味之间的相关性时，主要的挑战是口头定义的气味描述符的模糊和不明确性质，特别是当气味分子来自不同的来源时。随着机器学习（ML）技术的最新发展，ML 和数据分析技术在化学领域的定量构效关系（QSAR）中得到了广泛应用，用于知识发现，而传统的爱迪生方法在此并不适用。气味分子的嗅觉感知就是上述任务之一，因为与其他感觉相比，嗅觉是最不被理解的感觉之一。在这项研究中，生成了 XGBoost 气味预测模型，以根据 SMILES 字符串对气味分子进行分类。我们首先收集了包含七种基本气味描述符的 1278 种气味分子的数据集，然后计算了 1875 种气味分子的物理化学性质。为了获得相关的物理化学特征，还使用了一种称为 PCA 的特征降维算法。当在独立测试数据集上进行测试时，所开发的 ML 模型能够以高精度（>99%）和高灵敏度（>99%）预测所有七种基本气味。该研究的结果还与最近进行的三项研究进行了比较。结果表明，XGBoost-PCA 模型在预测常见气味描述符方面优于其他模型。本研究中开发的方法和 ML 模型可能有助于理解结构-气味关系。由 Ramaswamy H. Sarma 交流。

相似文献

XGBoost odor prediction model: finding the structure-odor relationship of odorant molecules using the extreme gradient boosting algorithm.

J Biomol Struct Dyn. 2024;42(20):10727-10738. doi: 10.1080/07391102.2023.2258415. Epub 2023 Sep 18.

Data based predictive models for odor perception.

Sci Rep. 2020 Oct 13;10(1):17136. doi: 10.1038/s41598-020-73978-1.

Machine-Learning-Based Olfactometer: Prediction of Odor Perception from Physicochemical Features of Odorant Molecules.

Anal Chem. 2017 Nov 21;89(22):11999-12005. doi: 10.1021/acs.analchem.7b02389. Epub 2017 Nov 7.

SMILES to Smell: Decoding the Structure-Odor Relationship of Chemical Compounds Using the Deep Neural Network Approach.

J Chem Inf Model. 2021 Feb 22;61(2):676-688. doi: 10.1021/acs.jcim.0c01288. Epub 2021 Jan 15.

A deep position-encoding model for predicting olfactory perception from molecular structures and electrostatics.

NPJ Syst Biol Appl. 2024 Jul 17;10(1):76. doi: 10.1038/s41540-024-00401-0.

Accurate prediction of personalized olfactory perception from large-scale chemoinformatic features.

Gigascience. 2018 Feb 1;7(2):1-11. doi: 10.1093/gigascience/gix127.

Is It Possible to Predict the Odor of a Molecule on the Basis of its Structure?

Int J Mol Sci. 2019 Jun 20;20(12):3018. doi: 10.3390/ijms20123018.

Engineering Aspects of Olfaction

SmellSpace: An Odor-Based Social Network as a Platform for Collecting Olfactory Perceptual Data.

Chem Senses. 2019 Apr 15;44(4):267-278. doi: 10.1093/chemse/bjz014.

Predicting odor pleasantness from odorant structure: pleasantness as a reflection of the physical world.

J Neurosci. 2007 Sep 12;27(37):10015-23. doi: 10.1523/JNEUROSCI.1158-07.2007.

引用本文的文献

Machine learning enables construction of a nomogram based on risk factors for adverse emotions in patients with diabetic foot infection.

Am J Transl Res. 2025 Aug 15;17(8):6056-6067. doi: 10.62347/ZWGQ9542. eCollection 2025.

A systematic review of data and models for predicting food flavor and texture.

Curr Res Food Sci. 2025 Jun 26;11:101127. doi: 10.1016/j.crfs.2025.101127. eCollection 2025.

An overview on olfaction in the biological, analytical, computational, and machine learning fields.

Arch Pharm (Weinheim). 2025 Jan;358(1):e2400414. doi: 10.1002/ardp.202400414. Epub 2024 Oct 22.

Research on prediction of in open-pit mine used RUN-XGBoost model.

Heliyon. 2024 Mar 20;10(7):e28246. doi: 10.1016/j.heliyon.2024.e28246. eCollection 2024 Apr 15.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

XGBoost 气味预测模型：利用极端梯度提升算法寻找气味分子的结构-气味关系。

XGBoost odor prediction model: finding the structure-odor relationship of odorant molecules using the extreme gradient boosting algorithm.

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献