Wang Yu-Tang, Yang Zhao-Xia, Piao Zan-Hao, Xu Xiao-Juan, Yu Jun-Hong, Zhang Ying-Hua
Department of Food Science, Northeast Agricultural University Harbin 150030 PR China.
Key Laboratory of Dairy Science, Ministry of Education, Northeast Agricultural University China
RSC Adv. 2021 Nov 17;11(58):36942-36950. doi: 10.1039/d1ra06551c. eCollection 2021 Nov 10.
In order to make a preliminary prediction of flavor and retention index (RI) for compounds in beer, this work applied the machine learning method to modeling depending on molecular structure. Towards this goal, the flavor compounds in beer from existing literature were collected. The database was classified into four groups as aromatic, bitter, sulfury, and others. The RI values on a non-polar SE-30 column and a polar Carbowax 20M column from the National Institute of Standards Technology (NIST) were investigated. The structures were converted to molecular descriptors calculated by molecular operating environment (MOE), ChemoPy and Mordred, respectively. By combining the pretreatment of the descriptors, machine learning models, including support vector machine (SVM), random forest (RF) and -nearest neighbour (NN) were utilized for beer flavor models. Principal component regression (PCR), random forest regression (RFR) and partial least squares (PLS) regression were employed to predict the RI. The accuracy of the test set was obtained by SVM, RF, and NN. Among them, the combination of descriptors calculated by Mordred and RF model afforded the highest accuracy of 0.686. of the optimal regression model achieved 0.96. The results indicated that the models can be used to predict the flavor of a specific compound in beer and its RI value.
为了对啤酒中化合物的风味和保留指数(RI)进行初步预测,本研究采用机器学习方法,基于分子结构进行建模。为此,收集了现有文献中啤酒的风味化合物。数据库分为芳香族、苦味、含硫和其他四类。研究了美国国家标准与技术研究院(NIST)的非极性SE - 30柱和极性Carbowax 20M柱上的RI值。结构分别转换为由分子操作环境(MOE)、ChemoPy和Mordred计算的分子描述符。通过结合描述符的预处理,利用包括支持向量机(SVM)、随机森林(RF)和 - 最近邻(NN)在内的机器学习模型构建啤酒风味模型。采用主成分回归(PCR)、随机森林回归(RFR)和偏最小二乘(PLS)回归来预测RI。通过SVM、RF和NN获得测试集的准确率。其中,由Mordred计算的描述符与RF模型的组合准确率最高,为0.686。最优回归模型的 达到0.96。结果表明,这些模型可用于预测啤酒中特定化合物的风味及其RI值。