School of Food and Health, Beijing Technology and Business University, Beijing, 100048, China.
Institute of Apicultural Research, Chinese Academy of Agricultural Sciences, Beijing, 100093, China.
J Agric Food Chem. 2023 Mar 8;71(9):4172-4183. doi: 10.1021/acs.jafc.2c08822. Epub 2023 Feb 24.
Astringency is a puckering or velvety sensation mainly derived from flavonoid compounds in food. The traditional experimental approach for astringent compound discovery was labor-intensive and cost-consuming, while machine learning (ML) can greatly accelerate this procedure. Herein, we propose the Flavonoid Astringency Prediction Database (FAPD) based on ML. First, the Molecular Fingerprint Similarities (MFSs) and thresholds of flavonoid compounds were hierarchically clustering analyzed. For the astringency threshold prediction, four regressions models (i.e., Gaussian Process Regression (GPR), Support Vector Regression (SVR), Random Forest (RF), and Gradient Boosted Decision Tree (GBDT)) were established, and the best model was RF which was interpreted by the SHapley Additive exPlanations (SHAP) approach. For the astringency type prediction, six classification models (i.e., RF, GBDT, Gaussian Naive Bayes (GNB), Support Vector Machine (SVM), k-Nearest Neighbor (kNN), and Stochastic Gradient Descent (SGD)) were established, and the best model was SGD. Furthermore, over 1200 natural flavonoid compounds were discovered and built into the customized FAPD. In FAPD, the astringency thresholds were achieved by RF; the astringency types were distinguished by SGD, and the real and predicted astringency types were verified by t-Distributed Stochastic Neighbor Embedding (t-SNE). Therefore, ML models can be used to predict the astringency threshold and astringency type of flavonoid compounds, which provides a new paradigm to research the molecular structure-flavor property relationship of food components.
涩味是一种收敛或天鹅绒般的感觉,主要源自食物中的类黄酮化合物。传统的涩味化合物发现实验方法既费力又昂贵,而机器学习 (ML) 可以大大加速这一过程。在此,我们提出了基于 ML 的类黄酮涩味预测数据库 (FAPD)。首先,对类黄酮化合物的分子指纹相似度 (MFS) 和阈值进行了层次聚类分析。对于涩味阈值预测,建立了四个回归模型(即高斯过程回归 (GPR)、支持向量回归 (SVR)、随机森林 (RF) 和梯度提升决策树 (GBDT)),并通过 SHapley Additive exPlanations (SHAP) 方法对最佳模型 RF 进行了解释。对于涩味类型预测,建立了六个分类模型(即 RF、GBDT、高斯朴素贝叶斯 (GNB)、支持向量机 (SVM)、k-最近邻 (kNN) 和随机梯度下降 (SGD)),最佳模型为 SGD。此外,还发现并构建了 1200 多种天然类黄酮化合物到定制的 FAPD 中。在 FAPD 中,RF 实现了涩味阈值;SGD 区分了涩味类型,真实和预测的涩味类型通过 t-Distributed Stochastic Neighbor Embedding (t-SNE) 进行了验证。因此,ML 模型可用于预测类黄酮化合物的涩味阈值和涩味类型,为研究食品成分的分子结构-风味性质关系提供了新的范例。