Suppr超能文献

从类似化学混合物的精油的质谱中预测连续值表示的人类气味感知。

Predicting human odor perception represented by continuous values from mass spectra of essential oils resembling chemical mixtures.

机构信息

Department of Information and Communications Engineering, Tokyo Institute of Technology, Yokohama, Kanagawa, Japan.

Laboratory for Future Interdisciplinary Research in Science and Technology, Tokyo Institute of Technology, Yokohama, Kanagawa, Japan.

出版信息

PLoS One. 2020 Jun 19;15(6):e0234688. doi: 10.1371/journal.pone.0234688. eCollection 2020.

Abstract

There have been recent advances in predicting odor characteristics using molecular structure parameters of chemicals. Although the molecular structure parameters are available for each chemical, they cannot be used for chemical mixtures. This study will elucidate a computational method of predicting human odor perception from the mass spectra of chemical mixtures such as essential oils. Furthermore, a method for obtaining similarity among odor descriptors has been proposed although the dataset contains binary values only. When the database indicates a set of odor descriptors for one sample, only binary data are available and the correlation between the similar descriptors disappears. Thus, the prediction performance degrades for not considering the similarity among the odor descriptors. Since mass spectra dataset is highly dimensional, we use auto-encoder to learn the compressed representation from the mass spectra of essential oils in its bottleneck hidden layer and then accomplishes the hierarchical clustering to create odor descriptor groups with similar odor impressions using a matrix of continuous value-based correlation coefficient as well as natural language processing. This work will help to expatiate the process of overcoming binary value problem and find out the similarity among odor descriptors using machine learning with natural language semantic representation of words. To overcome the problem of disproportionate ratio of positive and negative class for both the continuous value-based correlation coefficient and word similarity based models, we use Synthetic Minority Oversampling Technique (SMOTE). This model allows us to predict human odor perception through computer simulations by forming odor descriptors group. Accordingly, this study demonstrates the feasibility of ensembling machine learning with natural language processing and SMOTE approach for predicting odor descriptor group from mass spectra of essential oils.

摘要

最近,人们在利用化学物质的分子结构参数来预测气味特征方面取得了一些进展。虽然每种化学物质都有分子结构参数,但这些参数不能用于化学混合物。本研究将阐明一种从精油等化学混合物的质谱中预测人类嗅觉感知的计算方法。此外,尽管数据集仅包含二进制值,我们还是提出了一种获取气味描述符相似性的方法。当数据库为一个样本指示一组气味描述符时,只有二进制数据可用,相似描述符之间的相关性就会消失。因此,如果不考虑气味描述符之间的相似性,预测性能就会下降。由于质谱数据集具有高度的维度性,我们使用自动编码器从精油的质谱中学习压缩表示,并在瓶颈隐藏层中,然后使用基于连续值的相关系数矩阵和自然语言处理来完成层次聚类,以创建具有相似气味印象的气味描述符组。这项工作将有助于阐述克服二进制值问题的过程,并使用具有自然语言语义表示的机器学习找到气味描述符之间的相似性。为了解决基于连续值的相关系数和基于词的相似性模型中正负类比例不均衡的问题,我们使用了 Synthetic Minority Oversampling Technique(SMOTE)。该模型允许我们通过形成气味描述符组,通过计算机模拟来预测人类的嗅觉感知。因此,本研究展示了通过集成机器学习和自然语言处理以及 SMOTE 方法,从精油的质谱中预测气味描述符组的可行性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a69c/7304616/9d067f22d426/pone.0234688.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验