Department of Biochemistry and Molecular Biology, Mississippi State University, Mississippi, United States of America.
PLoS Comput Biol. 2011 Jul;7(7):e1002101. doi: 10.1371/journal.pcbi.1002101. Epub 2011 Jul 14.
Cell penetrating peptides (CPPs) are those peptides that can transverse cell membranes to enter cells. Once inside the cell, different CPPs can localize to different cellular components and perform different roles. Some generate pore-forming complexes resulting in the destruction of cells while others localize to various organelles. Use of machine learning methods to predict potential new CPPs will enable more rapid screening for applications such as drug delivery. We have investigated the influence of the composition of training datasets on the ability to classify peptides as cell penetrating using support vector machines (SVMs). We identified 111 known CPPs and 34 known non-penetrating peptides from the literature and commercial vendors and used several approaches to build training data sets for the classifiers. Features were calculated from the datasets using a set of basic biochemical properties combined with features from the literature determined to be relevant in the prediction of CPPs. Our results using different training datasets confirm the importance of a balanced training set with approximately equal number of positive and negative examples. The SVM based classifiers have greater classification accuracy than previously reported methods for the prediction of CPPs, and because they use primary biochemical properties of the peptides as features, these classifiers provide insight into the properties needed for cell-penetration. To confirm our SVM classifications, a subset of peptides classified as either penetrating or non-penetrating was selected for synthesis and experimental validation. Of the synthesized peptides predicted to be CPPs, 100% of these peptides were shown to be penetrating.
细胞穿透肽(CPPs)是指能够穿透细胞膜进入细胞的肽。进入细胞后,不同的 CPP 可以定位于不同的细胞成分并发挥不同的作用。一些 CPP 会形成孔形成复合物,导致细胞破坏,而另一些则定位于各种细胞器。使用机器学习方法预测潜在的新 CPP 将使药物输送等应用的快速筛选成为可能。我们研究了训练数据集的组成对使用支持向量机(SVM)对肽进行分类为细胞穿透的能力的影响。我们从文献和商业供应商中确定了 111 个已知的 CPP 和 34 个已知的非穿透肽,并使用几种方法为分类器构建训练数据集。使用一组基本生化特性和从文献中确定的与 CPP 预测相关的特性从数据集计算特征。我们使用不同的训练数据集的结果证实了具有平衡训练集的重要性,其中阳性和阴性示例的数量大致相等。基于 SVM 的分类器比以前报道的 CPP 预测方法具有更高的分类准确性,并且由于它们将肽的主要生化特性用作特征,因此这些分类器提供了对穿透细胞所需特性的深入了解。为了验证我们的 SVM 分类,选择了一小部分被分类为穿透或非穿透的肽进行合成和实验验证。在所预测的合成 CPP 肽中,100%的肽被证明是穿透的。