College of Computer Science and Electronic Engineering, Hunan University, Changsha, Hunan 410082, China.
Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610054, China.
Bioinformatics. 2020 May 1;36(10):3028-3034. doi: 10.1093/bioinformatics/btaa131.
Cell-penetrating peptides (CPPs) are a vehicle for transporting into living cells pharmacologically active molecules, such as short interfering RNAs, nanoparticles, plasmid DNAs and small peptides, thus offering great potential as future therapeutics. Existing experimental techniques for identifying CPPs are time-consuming and expensive. Thus, the prediction of CPPs from peptide sequences by using computational methods can be useful to annotate and guide the experimental process quickly. Many machine learning-based methods have recently emerged for identifying CPPs. Although considerable progress has been made, existing methods still have low feature representation capabilities, thereby limiting further performance improvements.
We propose a method called StackCPPred, which proposes three feature methods on the basis of the pairwise energy content of the residue as follows: RECM-composition, PseRECM and RECM-DWT. These features are used to train stacking-based machine learning methods to effectively predict CPPs. On the basis of the CPP924 and CPPsite3 datasets with jackknife validation, StackDPPred achieved 94.5% and 78.3% accuracy, which was 2.9% and 5.8% higher than the state-of-the-art CPP predictors, respectively. StackCPPred can be a powerful tool for predicting CPPs and their uptake efficiency, facilitating hypothesis-driven experimental design and accelerating their applications in clinical therapy.
Source code and data can be downloaded from https://github.com/Excelsior511/StackCPPred.
Supplementary data are available at Bioinformatics online.
细胞穿透肽(CPPs)是一种将药理活性分子(如短干扰 RNA、纳米颗粒、质粒 DNA 和小肽)输送到活细胞中的载体,因此具有成为未来治疗剂的巨大潜力。现有的鉴定 CPPs 的实验技术既耗时又昂贵。因此,通过计算方法从肽序列预测 CPPs 可以快速注释和指导实验过程。最近出现了许多基于机器学习的方法来识别 CPPs。尽管已经取得了相当大的进展,但现有的方法仍然具有较低的特征表示能力,从而限制了进一步的性能提高。
我们提出了一种名为 StackCPPred 的方法,该方法基于残基的成对能量含量提出了三种特征方法,即 RECM-组成、PseRECM 和 RECM-DWT。这些特征用于训练基于堆叠的机器学习方法,以有效地预测 CPPs。在使用 jackknife 验证的 CPP924 和 CPPsite3 数据集上,StackDPPred 实现了 94.5%和 78.3%的准确率,分别比最先进的 CPP 预测器高 2.9%和 5.8%。StackCPPred 可以成为预测 CPPs 及其摄取效率的有力工具,有助于驱动假设的实验设计并加速其在临床治疗中的应用。
源代码和数据可从 https://github.com/Excelsior511/StackCPPred 下载。
补充数据可在 Bioinformatics 在线获取。