Department of Biotechnology and Biochemistry, Center for Research and Advanced Studies of the National Polytechnic Institute (CINVESTAV-IPN), Irapuato Unit, 36824, Irapuato, Guanajuato, Mexico.
Sci Rep. 2024 May 25;14(1):11995. doi: 10.1038/s41598-024-62419-y.
Machine learning models are revolutionizing our approaches to discovering and designing bioactive peptides. These models often need protein structure awareness, as they heavily rely on sequential data. The models excel at identifying sequences of a particular biological nature or activity, but they frequently fail to comprehend their intricate mechanism(s) of action. To solve two problems at once, we studied the mechanisms of action and structural landscape of antimicrobial peptides as (i) membrane-disrupting peptides, (ii) membrane-penetrating peptides, and (iii) protein-binding peptides. By analyzing critical features such as dipeptides and physicochemical descriptors, we developed models with high accuracy (86-88%) in predicting these categories. However, our initial models (1.0 and 2.0) exhibited a bias towards α-helical and coiled structures, influencing predictions. To address this structural bias, we implemented subset selection and data reduction strategies. The former gave three structure-specific models for peptides likely to fold into α-helices (models 1.1 and 2.1), coils (1.3 and 2.3), or mixed structures (1.4 and 2.4). The latter depleted over-represented structures, leading to structure-agnostic predictors 1.5 and 2.5. Additionally, our research highlights the sensitivity of important features to different structure classes across models.
机器学习模型正在彻底改变我们发现和设计生物活性肽的方法。这些模型通常需要蛋白质结构意识,因为它们严重依赖于序列数据。这些模型擅长识别具有特定生物学性质或活性的序列,但它们经常无法理解其复杂的作用机制。为了同时解决两个问题,我们研究了抗菌肽的作用机制和结构景观,包括 (i) 破坏膜的肽、(ii) 穿透膜的肽和 (iii) 与蛋白质结合的肽。通过分析二肽和物理化学描述符等关键特征,我们开发了具有高准确性(86-88%)的模型,用于预测这些类别。然而,我们的初始模型(1.0 和 2.0)表现出对α-螺旋和卷曲结构的偏见,这影响了预测。为了解决这种结构偏差,我们实施了子集选择和数据缩减策略。前者为可能折叠成α-螺旋的肽(模型 1.1 和 2.1)、卷曲结构的肽(1.3 和 2.3)或混合结构的肽(1.4 和 2.4)提供了三个结构特异性模型。后者去除了过度代表的结构,导致了无结构预测器 1.5 和 2.5。此外,我们的研究还强调了重要特征对不同结构类别的模型的敏感性。