Le Van-The, Zhan Zi-Jun, Vu Thi-Thu-Phuong, Malik Muhammad-Shahid, Ou Yu-Yen
Department of Computer Science and Engineering, Yuan Ze University, Chung-Li, 32003, Taiwan.
Graduate Program in Biomedical Informatics, Yuan Ze University, Chung-Li, 32003, Taiwan.
J Mol Graph Model. 2024 Jul;130:108777. doi: 10.1016/j.jmgm.2024.108777. Epub 2024 Apr 17.
This study delves into the prediction of protein-peptide interactions using advanced machine learning techniques, comparing models such as sequence-based, standard CNNs, and traditional classifiers. Leveraging pre-trained language models and multi-view window scanning CNNs, our approach yields significant improvements, with ProtTrans standing out based on 2.1 billion protein sequences and 393 billion amino acids. The integrated model demonstrates remarkable performance, achieving an AUC of 0.856 and 0.823 on the PepBCL Set_1 and Set_2 datasets, respectively. Additionally, it attains a Precision of 0.564 in PepBCL Set 1 and 0.527 in PepBCL Set 2, surpassing the performance of previous methods. Beyond this, we explore the application of this model in cancer therapy, particularly in identifying peptide interactions for selective targeting of cancer cells, and other fields. The findings of this study contribute to bioinformatics, providing valuable insights for drug discovery and therapeutic development.
本研究利用先进的机器学习技术深入探讨蛋白质 - 肽相互作用的预测,比较基于序列的模型、标准卷积神经网络(CNNs)和传统分类器等模型。借助预训练语言模型和多视图窗口扫描卷积神经网络,我们的方法取得了显著改进,基于21亿个蛋白质序列和3930亿个氨基酸,ProtTrans表现突出。集成模型展示了卓越的性能,在PepBCL Set_1和Set_2数据集上分别实现了0.856和0.823的曲线下面积(AUC)。此外,它在PepBCL Set 1中的精确率为0.564,在PepBCL Set 2中的精确率为0.527,超过了先前方法的性能。除此之外,我们还探索了该模型在癌症治疗中的应用,特别是在识别用于选择性靶向癌细胞的肽相互作用以及其他领域的应用。本研究的结果对生物信息学有贡献,为药物发现和治疗开发提供了有价值的见解。