Bergamini Carlo M, Bianchi Nicoletta, Giaccone Valerio, Catellani Paolo, Alberghini Leonardo, Stella Alessandra, Biffani Stefano, Yaddehige Sachithra Kalhari, Bobbo Tania, Taccioli Cristian
Department of Neuroscience and Rehabilitation, University of Ferrara, Via L. Borsari 46, 44121 Ferrara, Italy.
Department of Translational Medicine, University of Ferrara, Via L. Borsari 46, 44121 Ferrara, Italy.
Biology (Basel). 2022 Jul 7;11(7):1024. doi: 10.3390/biology11071024.
Probiotic bacteria are microorganisms with beneficial effects on human health and are currently used in numerous food supplements. However, no selection process is able to effectively distinguish probiotics from non-probiotic organisms on the basis of their genomic characteristics. In the current study, four Machine Learning algorithms were employed to accurately identify probiotic bacteria based on their DNA characteristics. Although the prediction accuracies of all algorithms were excellent, the Neural Network returned the highest scores in all the evaluation metrics, managing to discriminate probiotics from non-probiotics with an accuracy greater than 90%. Interestingly, our analysis also highlighted the information content of the tRNA sequences as the most important feature in distinguishing the two groups of organisms probably because tRNAs have regulatory functions and might have allowed probiotics to evolve faster in the human gut environment. Through the methodology presented here, it was also possible to identify seven promising new probiotics that have a higher information content in their tRNA sequences compared to non-probiotics. In conclusion, we prove for the first time that Machine Learning methods can discriminate human probiotic from non-probiotic organisms underlining information within tRNA sequences as the most important genomic feature in distinguishing them.
益生菌是对人体健康有益的微生物,目前被用于众多的食品补充剂中。然而,没有任何一种筛选方法能够基于基因组特征有效地将益生菌与非益生菌生物区分开来。在当前的研究中,采用了四种机器学习算法,基于其DNA特征准确识别益生菌。尽管所有算法的预测准确率都很高,但神经网络在所有评估指标中得分最高,能够以超过90%的准确率区分益生菌和非益生菌。有趣的是,我们的分析还突出了tRNA序列的信息含量是区分这两组生物的最重要特征,这可能是因为tRNA具有调节功能,并且可能使益生菌在人类肠道环境中进化得更快。通过这里介绍的方法,还能够识别出七种有前景的新型益生菌,它们的tRNA序列中的信息含量比非益生菌更高。总之,我们首次证明机器学习方法可以区分人类益生菌和非益生菌生物,并强调tRNA序列中的信息是区分它们的最重要基因组特征。