Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Chiang Mai 50200, Thailand.
Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand.
Int J Mol Sci. 2021 Dec 4;22(23):13124. doi: 10.3390/ijms222313124.
Umami ingredients have been identified as important factors in food seasoning and production. Traditional experimental methods for characterizing peptides exhibiting umami sensory properties (umami peptides) are time-consuming, laborious, and costly. As a result, it is preferable to develop computational tools for the large-scale identification of available sequences in order to identify novel peptides with umami sensory properties. Although a computational tool has been developed for this purpose, its predictive performance is still insufficient. In this study, we use a feature representation learning approach to create a novel machine-learning meta-predictor called UMPred-FRL for improved umami peptide identification. We combined six well-known machine learning algorithms (extremely randomized trees, k-nearest neighbor, logistic regression, partial least squares, random forest, and support vector machine) with seven different feature encodings (amino acid composition, amphiphilic pseudo-amino acid composition, dipeptide composition, composition-transition-distribution, and pseudo-amino acid composition) to develop the final meta-predictor. Extensive experimental results demonstrated that UMPred-FRL was effective and achieved more accurate performance on the benchmark dataset compared to its baseline models, and consistently outperformed the existing method on the independent test dataset. Finally, to aid in the high-throughput identification of umami peptides, the UMPred-FRL web server was established and made freely available online. It is expected that UMPred-FRL will be a powerful tool for the cost-effective large-scale screening of candidate peptides with potential umami sensory properties.
鲜味成分已被确定为食品调味和生产的重要因素。传统的用于描述具有鲜味感官特性的肽(鲜味肽)的实验方法既耗时、费力又昂贵。因此,最好开发用于大规模鉴定可用序列的计算工具,以鉴定具有鲜味感官特性的新型肽。尽管已经为此开发了一种计算工具,但它的预测性能仍然不足。在这项研究中,我们使用特征表示学习方法来创建一种新颖的机器学习元预测器,称为 UMPred-FRL,用于改进鲜味肽的鉴定。我们结合了六种著名的机器学习算法(极端随机树、k-最近邻、逻辑回归、偏最小二乘、随机森林和支持向量机)和七种不同的特征编码(氨基酸组成、两亲性伪氨基酸组成、二肽组成、组成转移分布和伪氨基酸组成)来开发最终的元预测器。广泛的实验结果表明,UMPred-FRL 是有效的,并且在基准数据集上的性能比其基线模型更准确,并且在独立测试数据集上始终优于现有方法。最后,为了帮助高通量鉴定鲜味肽,建立了 UMPred-FRL 网络服务器并免费在线提供。预计 UMPred-FRL 将成为一种强大的工具,用于经济高效地大规模筛选具有潜在鲜味感官特性的候选肽。