School of Electrical Engineering and Computer Science, Washington State University, Pullman, WA, USA.
School of Engineering and Applied Sciences, Washington State University Tri-Cities, Richland, WA, USA.
BMC Bioinformatics. 2023 Aug 17;24(1):313. doi: 10.1186/s12859-023-05330-z.
Antibiotic resistance is a major public health concern around the globe. As a result, researchers always look for new compounds to develop new antibiotic drugs for combating antibiotic-resistant bacteria. Bacteriocin becomes a promising antimicrobial agent to fight against antibiotic resistance, due to cases of both broad and narrow killing spectra. Sequence matching methods are widely used to identify bacteriocins by comparing them with the known bacteriocin sequences; however, these methods often fail to detect new bacteriocin sequences due to their high diversity. The ability to use a machine learning approach can help find new highly dissimilar bacteriocins for developing highly effective antibiotic drugs. The aim of this work is to develop a machine learning-based software tool called BaPreS (Bacteriocin Prediction Software) using an optimal set of features for detecting bacteriocin protein sequences with high accuracy. We extracted potential features from known bacteriocin and non-bacteriocin sequences by considering the physicochemical and structural properties of the protein sequences. Then we reduced the feature set using statistical justifications and recursive feature elimination technique. Finally, we built support vector machine (SVM) and random forest (RF) models using the selected features and utilized the best machine learning model to implement the software tool.
We applied BaPreS to an established dataset and evaluated its prediction performance. Acquired results show that the software tool can achieve a prediction accuracy of 95.54% for testing protein sequences. This tool allows users to add new bacteriocin or non-bacteriocin sequences in the training dataset to further enhance the predictive power of the tool. We compared the prediction performance of the BaPreS with a popular sequence matching-based tool and a deep learning-based method, and our software tool outperformed both.
BaPreS is a bacteriocin prediction tool that can be used to discover new highly dissimilar bacteriocins for developing highly effective antibiotic drugs. This software tool can be used with Windows, Linux and macOS operating systems. The open-source software package and its user manual are available at https://github.com/suraiya14/BaPreS .
抗生素耐药性是全球主要的公共卫生问题。因此,研究人员一直在寻找新的化合物来开发新的抗生素药物以对抗抗药性细菌。由于具有广谱和窄谱杀菌谱,细菌素成为一种有前途的抗菌剂来对抗抗生素耐药性。通过比较已知的细菌素序列来识别细菌素的序列匹配方法被广泛应用;然而,由于其高度多样性,这些方法常常无法检测到新的细菌素序列。使用机器学习方法的能力可以帮助发现新的高度不同的细菌素来开发高效的抗生素药物。这项工作的目的是开发一种基于机器学习的软件工具,称为 BaPreS(细菌素预测软件),该软件使用一组最佳的特征来准确检测细菌素蛋白序列。我们通过考虑蛋白质序列的理化和结构特性,从已知的细菌素和非细菌素序列中提取潜在的特征。然后,我们使用统计理由和递归特征消除技术来减少特征集。最后,我们使用所选特征构建支持向量机(SVM)和随机森林(RF)模型,并利用最佳的机器学习模型来实现软件工具。
我们将 BaPreS 应用于已建立的数据集,并评估了其预测性能。获得的结果表明,该软件工具可以达到 95.54%的测试蛋白质序列的预测准确性。该工具允许用户在训练数据集中添加新的细菌素或非细菌素序列,以进一步提高工具的预测能力。我们比较了 BaPreS 的预测性能与流行的序列匹配工具和基于深度学习的方法,我们的软件工具优于这两者。
BaPreS 是一种细菌素预测工具,可用于发现新的高度不同的细菌素,以开发高效的抗生素药物。该软件工具可用于 Windows、Linux 和 macOS 操作系统。该开源软件包及其用户手册可在 https://github.com/suraiya14/BaPreS 上获得。