Badrinarayanan Srivathsan, Guntuboina Chakradhar, Mollaei Parisa, Barati Farimani Amir
Department of Chemical Engineering, Carnegie Mellon University, Pittsburgh 15213, Pennsylvania, United States.
Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh 15213, Pennsylvania, United States.
J Chem Inf Model. 2025 Jan 13;65(1):83-91. doi: 10.1021/acs.jcim.4c01443. Epub 2024 Dec 19.
Peptides are crucial in biological processes and therapeutic applications. Given their importance, advancing our ability to predict peptide properties is essential. In this study, we introduce Multi-Peptide, an innovative approach that combines transformer-based language models with graph neural networks (GNNs) to predict peptide properties. We integrate PeptideBERT, a transformer model specifically designed for peptide property prediction, with a GNN encoder to capture both sequence-based and structural features. By employing a contrastive loss framework, Multi-Peptide aligns embeddings from both modalities into a shared latent space, thereby enhancing the transformer model's predictive accuracy. Evaluations on hemolysis and nonfouling data sets demonstrate Multi-Peptide's robustness, achieving state-of-the-art 88.057% accuracy in hemolysis prediction. This study highlights the potential of multimodal learning in bioinformatics, paving the way for accurate and reliable predictions in peptide-based research and applications.
肽在生物过程和治疗应用中至关重要。鉴于它们的重要性,提高我们预测肽特性的能力至关重要。在本研究中,我们引入了多肽(Multi-Peptide),这是一种将基于Transformer的语言模型与图神经网络(GNN)相结合以预测肽特性的创新方法。我们将专门为肽特性预测设计的Transformer模型PeptideBERT与GNN编码器集成,以捕获基于序列和结构的特征。通过采用对比损失框架,多肽将来自两种模态的嵌入对齐到一个共享的潜在空间中,从而提高Transformer模型的预测准确性。对溶血和抗污数据集的评估证明了多肽的稳健性,在溶血预测中达到了88.057%的最先进准确率。这项研究突出了多模态学习在生物信息学中的潜力,为基于肽的研究和应用中的准确可靠预测铺平了道路。