Wang Ruheng, Nakai Kenta, Wei Leyi
Department of Biomedical Engineering, University of Texas Southwestern Medical Center, Dallas, TX, USA.
Human Genome Center, Institute of Medical Science, University of Tokyo, Tokyo, Japan.
Methods Mol Biol. 2025;2941:269-278. doi: 10.1007/978-1-0716-4623-6_16.
Identifying the protein-peptide binding residues is fundamentally important to understanding the mechanisms of protein functions and drug discovery. Although several computational methods have been developed, they highly rely on third-party tools or information for feature design, easily resulting in low computational efficacy and suffering from low predictive performance. We describe how to use an end-to-end computational method PepBCL that is free with feature design for high-throughput prediction of protein-peptide binding sites. PepBCL outperforms the state-of-the-art methods under benchmarking comparison and achieves more robust performance based on protein sequences only. We can automatically extract and learn high-latent representations of protein sequences relevant to protein structure and functions by the introduction of a well pretrained protein large language model. We overview our method and discuss how to run the supported codes to reproduce our predictor.
识别蛋白质-肽结合残基对于理解蛋白质功能机制和药物发现至关重要。尽管已经开发了几种计算方法,但它们在特征设计上高度依赖第三方工具或信息,容易导致计算效率低下且预测性能不佳。我们描述了如何使用一种端到端的计算方法PepBCL,该方法免费进行特征设计,用于高通量预测蛋白质-肽结合位点。在基准比较中,PepBCL优于现有方法,并且仅基于蛋白质序列就能实现更稳健的性能。通过引入经过良好预训练的蛋白质大语言模型,我们可以自动提取并学习与蛋白质结构和功能相关的蛋白质序列的高潜表示。我们概述了我们的方法,并讨论了如何运行支持的代码来重现我们的预测器。