用于乳酸菌产生的细菌素序列分类的深度学习神经网络开发

Deep learning neural network development for the classification of bacteriocin sequences produced by lactic acid bacteria.

作者信息

González Lady L, Arias-Serrano Isaac, Villalba-Meneses Fernando, Navas-Boada Paulo, Cruz-Varela Jonathan

机构信息

School of Biological Sciences and Engineering, University Yachay Tech, Urcuqui, Provincia de Imbabura, 100119, Ecuador.

出版信息

F1000Res. 2025 Jun 20;13:981. doi: 10.12688/f1000research.154432.2. eCollection 2024.

DOI:10.12688/f1000research.154432.2

PMID:40786095

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12332477/

Abstract

BACKGROUND

The rise of antibiotic-resistant bacteria presents a pressing need for exploring new natural compounds with innovative mechanisms to replace existing antibiotics. Bacteriocins offer promising alternatives for developing therapeutic and preventive strategies in livestock, aquaculture, and human health. Specifically, those produced by LAB are recognized as GRAS and QPS. This study aims to develop a deep learning model specifically designed to classify bacteriocins by their LAB origin, using interpretable k-mer features and embedding vectors to enable applications in antimicrobial discover.

METHODS

We developed a deep learning neural network for binary classification of bacteriocin amino acid sequences (BacLAB vs. Non-BacLAB). Features were extracted using k-mers (k=3,5,7,15,20) and vector embeddings (EV). Ten feature combinations were tested (e.g., EV, EV+5-mers+7-mers). Sequences were filtered by length (50-2000 AA) to ensure uniformity, and class balance was maintained (24,964 BacLAB vs. 25,000 Non-BacLAB). The model was trained on Google Colab, demonstrating computational accessibility without specialized hardware.

RESULTS

The '5-mers+7-mers+EV' group achieved the best performance, with k-fold cross-validation (k=30) showing: 9.90% loss, 90.14% accuracy, 90.30% precision, 90.10% recall and F1 score. Folder 22 stood out with 8.50% loss, 91.47% accuracy, and 91.00% precision, recall, and F1 score. Five sets of 100 LAB-specific k-mers were identified, revealing conserved motifs. Despite high accuracy, sequence length variation (50-2000 AA) may bias k-mer representation, favoring longer sequences. Additionally, experimental validation is required to confirm the biological activity of predicted bacteriocins. These aspects highlight directions for future research.

CONCLUSIONS

The model developed in this study achieved consistent results with those seen in the reviewed literature. It outperformed some studies by 3-10%. Its implementation in resource-limited settings is feasible via cloud platforms like Google Colab. The identified k-mers could guide the design of synthetic antimicrobials, pending further in vitro validation.

摘要

背景

抗生素耐药菌的出现迫切需要探索具有创新机制的新型天然化合物来替代现有抗生素。细菌素为在畜牧、水产养殖和人类健康领域制定治疗和预防策略提供了有前景的替代方案。具体而言，由乳酸菌产生的细菌素被公认为是一般认为安全（GRAS）和合格假定安全（QPS）的物质。本研究旨在开发一种深度学习模型，专门用于根据细菌素的乳酸菌来源对其进行分类，使用可解释的k-mer特征和嵌入向量以实现其在抗菌发现中的应用。

方法

我们开发了一个用于细菌素氨基酸序列二元分类（BacLAB与非BacLAB）的深度学习神经网络。使用k-mer（k = 3、5、7、15、2o）和向量嵌入（EV）提取特征。测试了十种特征组合（例如EV、EV + 5-mer + 7-mer）。通过长度（50 - 2000个氨基酸）对序列进行过滤以确保一致性，并保持类别平衡（24,964个BacLAB对25,000个非BacLAB）。该模型在谷歌Colab上进行训练，表明无需专用硬件即可实现计算访问。

结果

“5-mer + 7-mer + EV”组表现最佳，30折交叉验证显示：损失率为9.90%，准确率为90.14%，精确率为90.30%，召回率为90.10%，F1分数为90.10%。第22折尤为突出，损失率为8.50%，准确率为91.47%，精确率、召回率和F1分数均为91.00%。确定了五组100个乳酸菌特异性k-mer，揭示了保守基序。尽管准确率较高，但序列长度变化（50 - 2000个氨基酸）可能会使k-mer表示产生偏差，有利于较长序列。此外，需要进行实验验证以确认预测细菌素的生物活性。这些方面突出了未来研究的方向。

结论

本研究开发的模型取得了与综述文献一致的结果。它比一些研究的表现高出3 - 10%。通过谷歌Colab等云平台在资源有限的环境中实施该模型是可行的。所确定的k-mer可指导合成抗菌剂的设计，有待进一步的体外验证。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/896c/12332478/1d16f5d138ec/f1000research-13-183776-g0000.jpg

相似文献

Deep learning neural network development for the classification of bacteriocin sequences produced by lactic acid bacteria.

F1000Res. 2025 Jun 20;13:981. doi: 10.12688/f1000research.154432.2. eCollection 2024.

Development and Validation of a Convolutional Neural Network Model to Predict a Pathologic Fracture in the Proximal Femur Using Abdomen and Pelvis CT Images of Patients With Advanced Cancer.

Clin Orthop Relat Res. 2023 Nov 1;481(11):2247-2256. doi: 10.1097/CORR.0000000000002771. Epub 2023 Aug 23.

Bacteriocins from Lactic Acid Bacteria Could Modulate the Wnt Pathway: A Possible Therapeutic Candidate for the Management of Colorectal Cancer- An In silico Study.

Anticancer Agents Med Chem. 2025 Mar 12. doi: 10.2174/0118715206367950250228100833.

Facial Emotion Recognition of 16 Distinct Emotions From Smartphone Videos: Comparative Study of Machine Learning and Human Performance.

J Med Internet Res. 2025 Jul 2;27:e68942. doi: 10.2196/68942.

The quantity, quality and findings of network meta-analyses evaluating the effectiveness of GLP-1 RAs for weight loss: a scoping review.

Health Technol Assess. 2025 Jun 25:1-73. doi: 10.3310/SKHT8119.

Leveraging a foundation model zoo for cell similarity search in oncological microscopy across devices.

Front Oncol. 2025 Jun 18;15:1480384. doi: 10.3389/fonc.2025.1480384. eCollection 2025.

Variation within and between digital pathology and light microscopy for the diagnosis of histopathology slides: blinded crossover comparison study.

Health Technol Assess. 2025 Jul;29(30):1-75. doi: 10.3310/SPLK4325.

Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.

Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.

Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.

Cochrane Database Syst Rev. 2021 Apr 19;4(4):CD011535. doi: 10.1002/14651858.CD011535.pub4.

Home treatment for mental health problems: a systematic review.

Health Technol Assess. 2001;5(15):1-139. doi: 10.3310/hta5150.

本文引用的文献

Emerging lactic acid bacteria bacteriocins as anti-cancer and anti-tumor agents for human health.

Heliyon. 2024 Aug 29;10(17):e37054. doi: 10.1016/j.heliyon.2024.e37054. eCollection 2024 Sep 15.

BaPreS: a software tool for predicting bacteriocins using an optimal set of features.

BMC Bioinformatics. 2023 Aug 17;24(1):313. doi: 10.1186/s12859-023-05330-z.

Antimicrobial Activity of Peptides Produced by subsp. on Swine Pathogens.

Animals (Basel). 2023 Jul 28;13(15):2442. doi: 10.3390/ani13152442.

AMP-EBiLSTM: employing novel deep learning strategies for the accurate prediction of antimicrobial peptides.

Front Genet. 2023 Jul 24;14:1232117. doi: 10.3389/fgene.2023.1232117. eCollection 2023.

Immunomodulation, Bioavailability and Safety of Bacteriocins.

Life (Basel). 2023 Jul 7;13(7):1521. doi: 10.3390/life13071521.

Krein support vector machine classification of antimicrobial peptides.

Digit Discov. 2023 Feb 27;2(2):502-511. doi: 10.1039/d3dd00004d. eCollection 2023 Apr 11.

The spread of antibiotic resistance to humans and potential protection strategies.

Ecotoxicol Environ Saf. 2023 Apr 1;254:114734. doi: 10.1016/j.ecoenv.2023.114734. Epub 2023 Mar 10.

BADASS: BActeriocin-Diversity ASsessment Software.

BMC Bioinformatics. 2023 Jan 20;24(1):24. doi: 10.1186/s12859-022-05106-x.

Artificial intelligence as a smart approach to develop antimicrobial drug molecules: A paradigm to combat drug-resistant infections.

Drug Discov Today. 2023 Apr;28(4):103491. doi: 10.1016/j.drudis.2023.103491. Epub 2023 Jan 13.

Antibiotic resistance of Riemerella anatipestifer and comparative analysis of antibiotic-resistance gene detection methods.

Poult Sci. 2023 Mar;102(3):102405. doi: 10.1016/j.psj.2022.102405. Epub 2022 Dec 9.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

用于乳酸菌产生的细菌素序列分类的深度学习神经网络开发

Deep learning neural network development for the classification of bacteriocin sequences produced by lactic acid bacteria.

作者信息

González Lady L, Arias-Serrano Isaac, Villalba-Meneses Fernando, Navas-Boada Paulo, Cruz-Varela Jonathan

机构信息

School of Biological Sciences and Engineering, University Yachay Tech, Urcuqui, Provincia de Imbabura, 100119, Ecuador.