Redshaw Joseph, Ting Darren S J, Brown Alex, Hirst Jonathan D, Gärtner Thomas
School of Chemistry, University of Nottingham, University Park Nottingham NG7 2RD UK
Academic Ophthalmology, School of Medicine, University of Nottingham Nottingham NG7 2UH UK.
Digit Discov. 2023 Feb 27;2(2):502-511. doi: 10.1039/d3dd00004d. eCollection 2023 Apr 11.
Antimicrobial peptides (AMPs) represent a potential solution to the growing problem of antimicrobial resistance, yet their identification through wet-lab experiments is a costly and time-consuming process. Accurate computational predictions would allow rapid screening of candidate AMPs, thereby accelerating the discovery process. Kernel methods are a class of machine learning algorithms that utilise a kernel function to transform input data into a new representation. When appropriately normalised, the kernel function can be regarded as a notion of similarity between instances. However, many expressive notions of similarity are not valid kernel functions, meaning they cannot be used with standard kernel methods such as the support-vector machine (SVM). The Kreĭn-SVM represents generalisation of the standard SVM that admits a much larger class of similarity functions. In this study, we propose and develop Kreĭn-SVM models for AMP classification and prediction by employing the Levenshtein distance and local alignment score as sequence similarity functions. Utilising two datasets from the literature, each containing more than 3000 peptides, we train models to predict general antimicrobial activity. Our best models achieve an AUC of 0.967 and 0.863 on the test sets of each respective dataset, outperforming the in-house and literature baselines in both cases. We also curate a dataset of experimentally validated peptides, measured against and , in order to evaluate the applicability of our methodology in predicting microbe-specific activity. In this case, our best models achieve an AUC of 0.982 and 0.891, respectively. Models to predict both general and microbe-specific activities are made available as web applications.
抗菌肽(AMPs)是解决日益严重的抗菌耐药性问题的一种潜在方案,然而通过湿实验室实验来鉴定它们是一个成本高昂且耗时的过程。准确的计算预测能够快速筛选候选抗菌肽,从而加速发现过程。核方法是一类机器学习算法,它利用核函数将输入数据转换为一种新的表示形式。经过适当归一化后,核函数可被视为实例之间的相似性概念。然而,许多具有表现力的相似性概念并非有效的核函数,这意味着它们不能与诸如支持向量机(SVM)之类的标准核方法一起使用。Kreĭn-SVM是标准SVM的推广,它允许使用更大类别的相似性函数。在本研究中,我们通过采用莱文斯坦距离和局部比对得分作为序列相似性函数,提出并开发了用于抗菌肽分类和预测的Kreĭn-SVM模型。利用文献中的两个数据集,每个数据集包含3000多个肽,我们训练模型来预测一般抗菌活性。我们的最佳模型在每个数据集的测试集上分别达到了0.967和0.863的曲线下面积(AUC),在两种情况下均优于内部和文献基线。我们还精心策划了一个经过实验验证的肽数据集,针对[具体微生物1]和[具体微生物2]进行了测量,以评估我们的方法在预测微生物特异性活性方面的适用性。在这种情况下,我们的最佳模型分别达到了0.982和0.891的AUC。用于预测一般和微生物特异性活性的模型都作为网络应用程序提供。