School of Medicine, Shanghai University, Shanghai, 200444, China.
Department of Critical Care Medicine, Shanghai Tenth People's Hospital, School of Medicine, Tongji University, Shanghai, 200072, China.
Adv Biol (Weinh). 2023 Jun;7(6):e2200232. doi: 10.1002/adbi.202200232. Epub 2023 Feb 12.
Peptides have shown increasing advantages and significant clinical value in drug discovery and development. With the development of high-throughput technologies and artificial intelligence (AI), machine learning (ML) methods for discovering new lead peptides have been expanded and incorporated into rational drug design. Predictions of peptide-protein interactions (PepPIs) and protein-protein interactions (PPIs) are both opportunities and challenges in computational biology, which will help to better understand the mechanisms of disease and provide the impetus for the discovery of lead peptides. This paper comprehensively reviews computational models for PepPI and PPI predictions. It begins with an introduction of various databases of peptide ligands and target proteins. Then it discusses data formats and feature representations for proteins and peptides. Furthermore, classical ML methods and emerging deep learning (DL) methods that can be used to train prediction models of PepPI and PPI are classified into four categories, and their advantages and disadvantages are analyzed. To assess the relative performance of different models, different validation protocols and evaluation indexes are discussed. The goal of this review is to help researchers quickly get started to develop computational frameworks using these integrated resources and eventually promote the discovery of lead peptides.
肽在药物发现和开发中显示出越来越多的优势和重要的临床价值。随着高通量技术和人工智能 (AI) 的发展,用于发现新的先导肽的机器学习 (ML) 方法得到了扩展,并被纳入合理的药物设计中。肽-蛋白相互作用 (PepPI) 和蛋白-蛋白相互作用 (PPI) 的预测都是计算生物学中的机遇和挑战,这将有助于更好地了解疾病的机制,并为发现先导肽提供动力。本文全面综述了用于 PepPI 和 PPI 预测的计算模型。它首先介绍了各种肽配体和靶蛋白数据库。然后讨论了蛋白质和肽的数据格式和特征表示。此外,可用于训练 PepPI 和 PPI 预测模型的经典 ML 方法和新兴的深度学习 (DL) 方法被分为四类,并分析了它们的优缺点。为了评估不同模型的相对性能,讨论了不同的验证协议和评估指标。本文的目的是帮助研究人员快速使用这些集成资源开发计算框架,并最终促进先导肽的发现。