Suppr超能文献

iProbiotics:一个从全基因组一级序列快速鉴定益生菌特性的机器学习平台。

iProbiotics: a machine learning platform for rapid identification of probiotic properties from whole-genome primary sequences.

机构信息

State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of life sciences, Inner Mongolia University, Hohhot 010070, China.

Key Laboratory of Dairy Biotechnology and Engineering, Ministry of Education, Inner Mongolia Agricultural University, Hohhot 010018, China.

出版信息

Brief Bioinform. 2022 Jan 17;23(1). doi: 10.1093/bib/bbab477.

Abstract

Lactic acid bacteria consortia are commonly present in food, and some of these bacteria possess probiotic properties. However, discovery and experimental validation of probiotics require extensive time and effort. Therefore, it is of great interest to develop effective screening methods for identifying probiotics. Advances in sequencing technology have generated massive genomic data, enabling us to create a machine learning-based platform for such purpose in this work. This study first selected a comprehensive probiotics genome dataset from the probiotic database (PROBIO) and literature surveys. Then, k-mer (from 2 to 8) compositional analysis was performed, revealing diverse oligonucleotide composition in strain genomes and apparently more probiotic (P-) features in probiotic genomes than non-probiotic genomes. To reduce noise and improve computational efficiency, 87 376 k-mers were refined by an incremental feature selection (IFS) method, and the model achieved the maximum accuracy level at 184 core features, with a high prediction accuracy (97.77%) and area under the curve (98.00%). Functional genomic analysis using annotations from gene ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG) and Rapid Annotation using Subsystem Technology (RAST) databases, as well as analysis of genes associated with host gastrointestinal survival/settlement, carbohydrate utilization, drug resistance and virulence factors, revealed that the distribution of P-features was biased toward genes/pathways related to probiotic function. Our results suggest that the role of probiotics is not determined by a single gene, but by a combination of k-mer genomic components, providing new insights into the identification and underlying mechanisms of probiotics. This work created a novel and free online bioinformatic tool, iProbiotics, which would facilitate rapid screening for probiotics.

摘要

乳酸菌群落普遍存在于食品中,其中一些细菌具有益生菌特性。然而,益生菌的发现和实验验证需要大量的时间和精力。因此,开发有效的筛选方法来鉴定益生菌具有重要意义。测序技术的进步产生了大量的基因组数据,使我们能够在这项工作中创建一个基于机器学习的平台来实现这一目标。本研究首先从益生菌数据库 (PROBIO) 和文献调查中选择了一个全面的益生菌基因组数据集。然后,进行了 k-mer(从 2 到 8)组成分析,揭示了菌株基因组中多样化的寡核苷酸组成,并且益生菌基因组显然比非益生菌基因组具有更多的益生菌(P-)特征。为了减少噪声并提高计算效率,通过增量特征选择(IFS)方法对 87,376 个 k-mer 进行了精炼,该模型在 184 个核心特征处达到了最大准确性水平,具有较高的预测准确性(97.77%)和曲线下面积(98.00%)。使用基因本体论(GO)、京都基因与基因组百科全书(KEGG)和快速基于子系统技术的注释(RAST)数据库的功能基因组分析,以及与宿主胃肠道存活/定植、碳水化合物利用、耐药性和毒力因子相关的基因分析,表明 P-特征的分布偏向于与益生菌功能相关的基因/途径。我们的结果表明,益生菌的作用不是由单个基因决定的,而是由 k-mer 基因组成分的组合决定的,为益生菌的鉴定和潜在机制提供了新的见解。这项工作创建了一个新颖且免费的在线生物信息学工具 iProbiotics,这将有助于快速筛选益生菌。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验