de Oliveira Ewerton Cristhian Lima, Santana Kauê, Josino Luiz, Lima E Lima Anderson Henrique, de Souza de Sales Júnior Claudomiro
Institute of Technology, Federal University of Pará, Belém, Pará, 66075-110, Brazil.
Institute of Biodiversity, Federal University of Western Pará, Vera Paz street, s/n Salé, Santarém, Pará, 68040-255, Brazil.
Sci Rep. 2021 Apr 7;11(1):7628. doi: 10.1038/s41598-021-87134-w.
Cell-penetrating peptides (CPPs) are naturally able to cross the lipid bilayer membrane that protects cells. These peptides share common structural and physicochemical properties and show different pharmaceutical applications, among which drug delivery is the most important. Due to their ability to cross the membranes by pulling high-molecular-weight polar molecules, they are termed Trojan horses. In this study, we proposed a machine learning (ML)-based framework named BChemRF-CPPred (beyond chemical rules-based framework for CPP prediction) that uses an artificial neural network, a support vector machine, and a Gaussian process classifier to differentiate CPPs from non-CPPs, using structure- and sequence-based descriptors extracted from PDB and FASTA formats. The performance of our algorithm was evaluated by tenfold cross-validation and compared with those of previously reported prediction tools using an independent dataset. The BChemRF-CPPred satisfactorily identified CPP-like structures using natural and synthetic modified peptide libraries and also obtained better performance than those of previously reported ML-based algorithms, reaching the independent test accuracy of 90.66% (AUC = 0.9365) for PDB, and an accuracy of 86.5% (AUC = 0.9216) for FASTA input. Moreover, our analyses of the CPP chemical space demonstrated that these peptides break some molecular rules related to the prediction of permeability of therapeutic molecules in cell membranes. This is the first comprehensive analysis to predict synthetic and natural CPP structures and to evaluate their chemical space using an ML-based framework. Our algorithm is freely available for academic use at http://comptools.linc.ufpa.br/BChemRF-CPPred .
细胞穿透肽(CPPs)天然具有穿过保护细胞的脂质双分子层膜的能力。这些肽具有共同的结构和物理化学性质,并展现出不同的药物应用,其中药物递送是最重要的应用。由于它们能够通过拉动高分子量极性分子穿过膜,因此被称为“特洛伊木马”。在本研究中,我们提出了一个基于机器学习(ML)的框架,名为BChemRF-CPPred(超越基于化学规则的CPP预测框架),该框架使用人工神经网络、支持向量机和高斯过程分类器,通过从PDB和FASTA格式中提取的基于结构和序列的描述符,将CPP与非CPP区分开来。我们的算法性能通过十折交叉验证进行评估,并使用独立数据集与先前报道的预测工具进行比较。BChemRF-CPPred使用天然和合成修饰的肽库令人满意地识别出类CPP结构,并且比先前报道的基于ML的算法表现更好,对于PDB达到了90.66%的独立测试准确率(AUC = 0.9365),对于FASTA输入达到了86.5%的准确率(AUC = 0.9216)。此外,我们对CPP化学空间的分析表明,这些肽打破了一些与治疗性分子在细胞膜中渗透性预测相关的分子规则。这是首次使用基于ML的框架对合成和天然CPP结构进行预测并评估其化学空间的全面分析。我们的算法可在http://comptools.linc.ufpa.br/BChemRF-CPPred上免费用于学术用途。