Suppr超能文献

抗氧化三肽定量构效关系建模中机器学习方法的综合评价与比较

Comprehensive Evaluation and Comparison of Machine Learning Methods in QSAR Modeling of Antioxidant Tripeptides.

作者信息

Du Zhenjiao, Wang Donghai, Li Yonghui

机构信息

Department of Grain Science and Industry, Kansas State University, Manhattan, Kansas 66506, United States.

Department of Biological and Agricultural Engineering, Kansas State University, Manhattan, Kansas 66506, United States.

出版信息

ACS Omega. 2022 Jul 15;7(29):25760-25771. doi: 10.1021/acsomega.2c03062. eCollection 2022 Jul 26.

Abstract

Due to their multiple beneficial effects, antioxidant peptides have attracted increasing interest. Currently, the screening and identification of bioactive peptides, including antioxidative peptides based on wet-chemistry methods are time-consuming and highly rely on many advanced instruments and trained personnel. Quantitative structure-activity relationship (QSAR) analysis as an method can be more efficient and cost-effective. However, model performance of QSAR studies on antioxidant peptides was still poor due to limited attempts in model development approaches. The objective of this study was to compare popular machine learning methods for antioxidant activity modeling and screening of tripeptides and identify the critical amino acid features that determine the antioxidant activity. 533 numerical indices of amino acids were adopted to characterize 130 tripeptides with known antioxidant activity from the published literature, and then 7 feature selection strategies plus pairwise correlation were used to screen the most important indices for antioxidant activity and model building. 14 machine learning methods were used to build models based on the feature selection strategies, respectively. Among the 98 models, non-linear regression methods tended to perform better, and the best model with an of 0.847 and RMSE of 0.627 for tripeptide antioxidants was obtained by combining random forest for feature selection and tree-based extreme gradient boost regression for model development. Based on the predicted antioxidant values of 7870 unknown tripeptides, potentially high antioxidant activity tripeptides all have a tyrosine, tryptophan, or cysteine residue at the C-terminal position. Furthermore, the predicted antioxidant activity of six synthesized tripeptides was confirmed through experimental determination, and for the first time, the cysteine or tyrosine residue at the C-terminal was found to be critical to the antioxidant activity based on both QSAR models and experimental observations.

摘要

由于其多种有益作用,抗氧化肽已引起越来越多的关注。目前,基于湿化学方法的生物活性肽(包括抗氧化肽)的筛选和鉴定既耗时,又高度依赖许多先进仪器和专业人员。定量构效关系(QSAR)分析作为一种方法可能更高效且成本效益更高。然而,由于模型开发方法的尝试有限,抗氧化肽的QSAR研究的模型性能仍然较差。本研究的目的是比较用于抗氧化活性建模和三肽筛选的流行机器学习方法,并确定决定抗氧化活性的关键氨基酸特征。采用533个氨基酸数值指标来表征从已发表文献中获取的130种具有已知抗氧化活性的三肽,然后使用7种特征选择策略以及成对相关性来筛选对抗氧化活性和模型构建最重要的指标。分别使用14种机器学习方法基于特征选择策略构建模型。在98个模型中,非线性回归方法往往表现更好,通过结合用于特征选择的随机森林和用于模型开发的基于树的极端梯度提升回归,获得了用于三肽抗氧化剂的最佳模型,其R²为0.847,RMSE为0.627。基于7870种未知三肽的预测抗氧化值,具有潜在高抗氧化活性的三肽在C末端位置均具有酪氨酸、色氨酸或半胱氨酸残基。此外,通过实验测定证实了六种合成三肽的预测抗氧化活性,并且首次基于QSAR模型和实验观察发现C末端的半胱氨酸或酪氨酸残基对抗氧化活性至关重要。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3368/9330208/e13af71386f4/ao2c03062_0002.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验