Suppr超能文献

基于结构的机器学习策略在抗菌肽发现中的应用。

Structure-aware machine learning strategies for antimicrobial peptide discovery.

机构信息

Department of Biotechnology and Biochemistry, Center for Research and Advanced Studies of the National Polytechnic Institute (CINVESTAV-IPN), Irapuato Unit, 36824, Irapuato, Guanajuato, Mexico.

出版信息

Sci Rep. 2024 May 25;14(1):11995. doi: 10.1038/s41598-024-62419-y.

Abstract

Machine learning models are revolutionizing our approaches to discovering and designing bioactive peptides. These models often need protein structure awareness, as they heavily rely on sequential data. The models excel at identifying sequences of a particular biological nature or activity, but they frequently fail to comprehend their intricate mechanism(s) of action. To solve two problems at once, we studied the mechanisms of action and structural landscape of antimicrobial peptides as (i) membrane-disrupting peptides, (ii) membrane-penetrating peptides, and (iii) protein-binding peptides. By analyzing critical features such as dipeptides and physicochemical descriptors, we developed models with high accuracy (86-88%) in predicting these categories. However, our initial models (1.0 and 2.0) exhibited a bias towards α-helical and coiled structures, influencing predictions. To address this structural bias, we implemented subset selection and data reduction strategies. The former gave three structure-specific models for peptides likely to fold into α-helices (models 1.1 and 2.1), coils (1.3 and 2.3), or mixed structures (1.4 and 2.4). The latter depleted over-represented structures, leading to structure-agnostic predictors 1.5 and 2.5. Additionally, our research highlights the sensitivity of important features to different structure classes across models.

摘要

机器学习模型正在彻底改变我们发现和设计生物活性肽的方法。这些模型通常需要蛋白质结构意识,因为它们严重依赖于序列数据。这些模型擅长识别具有特定生物学性质或活性的序列,但它们经常无法理解其复杂的作用机制。为了同时解决两个问题,我们研究了抗菌肽的作用机制和结构景观,包括 (i) 破坏膜的肽、(ii) 穿透膜的肽和 (iii) 与蛋白质结合的肽。通过分析二肽和物理化学描述符等关键特征,我们开发了具有高准确性(86-88%)的模型,用于预测这些类别。然而,我们的初始模型(1.0 和 2.0)表现出对α-螺旋和卷曲结构的偏见,这影响了预测。为了解决这种结构偏差,我们实施了子集选择和数据缩减策略。前者为可能折叠成α-螺旋的肽(模型 1.1 和 2.1)、卷曲结构的肽(1.3 和 2.3)或混合结构的肽(1.4 和 2.4)提供了三个结构特异性模型。后者去除了过度代表的结构,导致了无结构预测器 1.5 和 2.5。此外,我们的研究还强调了重要特征对不同结构类别的模型的敏感性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2b08/11127937/f33313bfec78/41598_2024_62419_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验