Institute for Integrated and Intelligent Systems, Griffith University, Brisbane, Australia.
Laboratory for Medical Science Mathematics, Department of Biological Sciences, School of Science, The University of Tokyo, Tokyo, Japan.
Sci Rep. 2023 Nov 28;13(1):20882. doi: 10.1038/s41598-023-47624-5.
Protein-peptide interactions play a crucial role in various cellular processes and are implicated in abnormal cellular behaviors leading to diseases such as cancer. Therefore, understanding these interactions is vital for both functional genomics and drug discovery efforts. Despite a significant increase in the availability of protein-peptide complexes, experimental methods for studying these interactions remain laborious, time-consuming, and expensive. Computational methods offer a complementary approach but often fall short in terms of prediction accuracy. To address these challenges, we introduce PepCNN, a deep learning-based prediction model that incorporates structural and sequence-based information from primary protein sequences. By utilizing a combination of half-sphere exposure, position specific scoring matrices from multiple-sequence alignment tool, and embedding from a pre-trained protein language model, PepCNN outperforms state-of-the-art methods in terms of specificity, precision, and AUC. The PepCNN software and datasets are publicly available at https://github.com/abelavit/PepCNN.git .
蛋白质-肽相互作用在各种细胞过程中起着至关重要的作用,并与导致癌症等疾病的异常细胞行为有关。因此,了解这些相互作用对于功能基因组学和药物发现工作都至关重要。尽管蛋白质-肽复合物的可用性显著增加,但研究这些相互作用的实验方法仍然繁琐、耗时且昂贵。计算方法提供了一种补充方法,但在预测准确性方面往往存在不足。为了解决这些挑战,我们引入了 PepCNN,这是一种基于深度学习的预测模型,它结合了来自原始蛋白质序列的结构和基于序列的信息。通过利用半球暴露、来自多序列比对工具的位置特异性评分矩阵以及来自预训练的蛋白质语言模型的嵌入,PepCNN 在特异性、精度和 AUC 方面优于最先进的方法。PepCNN 软件和数据集可在 https://github.com/abelavit/PepCNN.git 上公开获取。