School of Basic Medical Sciences, Lanzhou University, Lanzhou, 730000, China.
The Second Hospital Clinical Medical School, Lanzhou University, Lanzhou, 730000, China.
Bioinformatics. 2024 Aug 2;40(8). doi: 10.1093/bioinformatics/btae473.
There has been a burgeoning interest in cyclic peptide therapeutics due to their various outstanding advantages and strong potential for drug formation. However, it is undoubtedly costly and inefficient to use traditional wet lab methods to clarify their biological activities. Using artificial intelligence instead is a more energy-efficient and faster approach. MuCoCP aims to build a complete pre-trained model for extracting potential features of cyclic peptides, which can be fine-tuned to accurately predict cyclic peptide bioactivity on various downstream tasks. To maximize its effectiveness, we use a novel data augmentation method based on a priori chemical knowledge and multiple unsupervised training objective functions to greatly improve the information-grabbing ability of the model.
To assay the efficacy of the model, we conducted validation on the membrane-permeability of cyclic peptides which achieved an accuracy of 0.87 and R-squared of 0.503 on CycPeptMPDB using semi-supervised training and obtained an accuracy of 0.84 and R-squared of 0.384 using a model with frozen parameters on an external dataset. This result has achieved state-of-the-art, which substantiates the stability and generalization capability of MuCoCP. It means that MuCoCP can fully explore the high-dimensional information of cyclic peptides and make accurate predictions on downstream bioactivity tasks, which will serve as a guide for the future de novo design of cyclic peptide drugs and promote the development of cyclic peptide drugs.
All code used in our proposed method can be found at https://github.com/lennonyu11234/MuCoCP.
由于环状肽具有各种突出的优势和很强的成药性潜力,因此人们对环状肽治疗药物的兴趣日益浓厚。然而,使用传统的湿实验室方法来阐明它们的生物活性无疑是昂贵且低效的。使用人工智能则是一种更节能、更快的方法。MuCoCP 旨在构建一个完整的预训练模型,用于提取环状肽的潜在特征,该模型可以经过微调,以准确预测各种下游任务中的环状肽生物活性。为了最大限度地提高其效果,我们使用了一种新颖的数据增强方法,该方法基于先验化学知识和多个无监督训练目标函数,极大地提高了模型的信息获取能力。
为了评估模型的效果,我们对环状肽的膜通透性进行了验证,在使用半监督训练的情况下,在 CycPeptMPDB 上的准确率为 0.87,R-squared 为 0.503,在使用外部数据集冻结参数的模型上的准确率为 0.84,R-squared 为 0.384。这一结果达到了目前的最佳水平,证明了 MuCoCP 的稳定性和泛化能力。这意味着 MuCoCP 可以充分挖掘环状肽的高维信息,并对下游生物活性任务进行准确预测,这将为环状肽药物的从头设计提供指导,并推动环状肽药物的发展。
我们提出的方法中使用的所有代码都可以在 https://github.com/lennonyu11234/MuCoCP 上找到。