Suppr超能文献

BPAGS:一种通过使用交替决策树、遗传算法和线性支持向量分类器进行特征评估来预测细菌素的网络应用程序。

BPAGS: a web application for bacteriocin prediction via feature evaluation using alternating decision tree, genetic algorithm, and linear support vector classifier.

作者信息

Akhter Suraiya, Miller John H

机构信息

School of Electrical Engineering and Computer Science, Washington State University, Pullman, WA, United States.

School of Engineering and Applied Sciences, Washington State University Tri-Cities, Richland, WA, United States.

出版信息

Front Bioinform. 2024 Jan 10;3:1284705. doi: 10.3389/fbinf.2023.1284705. eCollection 2023.

Abstract

The use of bacteriocins has emerged as a propitious strategy in the development of new drugs to combat antibiotic resistance, given their ability to kill bacteria with both broad and narrow natural spectra. Hence, a compelling requirement arises for a precise and efficient computational model that can accurately predict novel bacteriocins. Machine learning's ability to learn patterns and features from bacteriocin sequences that are difficult to capture using sequence matching-based methods makes it a potentially superior choice for accurate prediction. A web application for predicting bacteriocin was created in this study, utilizing a machine learning approach. The feature sets employed in the application were chosen using alternating decision tree (ADTree), genetic algorithm (GA), and linear support vector classifier (linear SVC)-based feature evaluation methods. Initially, potential features were extracted from the physicochemical, structural, and sequence-profile attributes of both bacteriocin and non-bacteriocin protein sequences. We assessed the candidate features first using the Pearson correlation coefficient, followed by separate evaluations with ADTree, GA, and linear SVC to eliminate unnecessary features. Finally, we constructed random forest (RF), support vector machine (SVM), decision tree (DT), logistic regression (LR), -nearest neighbors (KNN), and Gaussian naïve Bayes (GNB) models using reduced feature sets. We obtained the overall top performing model using SVM with ADTree-reduced features, achieving an accuracy of 99.11% and an AUC value of 0.9984 on the testing dataset. We also assessed the predictive capabilities of our best-performing models for each reduced feature set relative to our previously developed software solution, a sequence alignment-based tool, and a deep-learning approach. A web application, titled BPAGS (Bacteriocin Prediction based on ADTree, GA, and linear SVC), was developed to incorporate the predictive models built using ADTree, GA, and linear SVC-based feature sets. Currently, the web-based tool provides classification results with associated probability values and has options to add new samples in the training data to improve the predictive efficacy. BPAGS is freely accessible at https://shiny.tricities.wsu.edu/bacteriocin-prediction/.

摘要

鉴于细菌素具有杀死广谱和窄谱细菌的能力,其在开发对抗抗生素耐药性的新药方面已成为一种有利策略。因此,迫切需要一种精确高效的计算模型,能够准确预测新型细菌素。机器学习能够从细菌素序列中学习基于序列匹配方法难以捕捉的模式和特征,这使其成为准确预测的潜在优越选择。本研究利用机器学习方法创建了一个用于预测细菌素的网络应用程序。该应用程序中使用的特征集是通过基于交替决策树(ADTree)、遗传算法(GA)和线性支持向量分类器(linear SVC)的特征评估方法选择的。最初,从细菌素和非细菌素蛋白质序列的物理化学、结构和序列概况属性中提取潜在特征。我们首先使用皮尔逊相关系数评估候选特征,然后分别用ADTree、GA和linear SVC进行评估,以消除不必要的特征。最后,我们使用精简后的特征集构建了随机森林(RF)、支持向量机(SVM)、决策树(DT)、逻辑回归(LR)、K近邻(KNN)和高斯朴素贝叶斯(GNB)模型。我们使用具有ADTree精简特征的SVM获得了总体表现最佳的模型,在测试数据集上的准确率达到99.11%,AUC值为0.9984。我们还评估了我们表现最佳的模型对于每个精简特征集相对于我们之前开发的软件解决方案、基于序列比对的工具和深度学习方法的预测能力。开发了一个名为BPAGS(基于ADTree、GA和linear SVC的细菌素预测)的网络应用程序,以纳入使用基于ADTree、GA和linear SVC的特征集构建的预测模型。目前,该基于网络的工具提供带有相关概率值的分类结果,并具有在训练数据中添加新样本以提高预测效果的选项。可通过https://shiny.tricities.wsu.edu/bacteriocin-prediction/免费访问BPAGS。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bc2e/10807691/37f10577c8b4/fbinf-03-1284705-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验