ITC Life Sciences and Technology Centre, Peenya Industrial Area, 1st Phase, Bengaluru 560058, India.
Centre for Advanced Process Decision Making, Carnegie Mellon University, USA.
Comput Biol Chem. 2024 Aug;111:108116. doi: 10.1016/j.compbiolchem.2024.108116. Epub 2024 May 29.
Taste is crucial in driving food choice and preference. Umami is one of the basic tastes defined by characteristic deliciousness and mouthfulness that it imparts to foods. Identification of ingredients to enhance umami taste is of significant value to food industry. Various models have been shown to predict umami taste using feature encodings derived from traditional molecular descriptors such as amphiphilic pseudo-amino acid composition, dipeptide composition, and composition-transition-distribution. Highest reported accuracy of 90.5 % was recently achieved through novel model architecture. Here, we propose use of biological sequence transformers such as ProtBert and ESM2, trained on the Uniref databases, as the feature encoders block. With combination of 2 encoders and 2 classifiers, 4 model architectures were developed. Among the 4 models, ProtBert-CNN model outperformed other models with accuracy of 95 % on 5-fold cross validation data and 94 % on independent data.
味道在驱动食物选择和偏好方面至关重要。鲜味是基本味道之一,其特点是赋予食物美味和口感。识别能够增强鲜味的成分对食品工业具有重要价值。已经有多种模型被证明可以使用源自传统分子描述符(如两亲性伪氨基酸组成、二肽组成和组成-转换-分布)的特征编码来预测鲜味。最近通过新的模型架构实现了 90.5%的最高报告准确性。在这里,我们建议使用生物序列转换器(如 ProtBert 和 ESM2)作为特征编码器块,这些转换器是基于 Uniref 数据库进行训练的。结合使用 2 个编码器和 2 个分类器,开发了 4 种模型架构。在这 4 种模型中,ProtBert-CNN 模型在 5 折交叉验证数据上的准确率为 95%,在独立数据上的准确率为 94%,表现优于其他模型。