Suppr超能文献

基于注释的受体结合蛋白序列预测噬菌体宿主。

Predicting bacteriophage hosts based on sequences of annotated receptor-binding proteins.

机构信息

KERMIT, Department of Data Analysis and Mathematical Modelling, Ghent University, Ghent, Belgium.

Laboratory of Applied Biotechnology, Department of Biotechnology, Ghent University, Ghent, Belgium.

出版信息

Sci Rep. 2021 Jan 14;11(1):1467. doi: 10.1038/s41598-021-81063-4.

Abstract

Nowadays, bacteriophages are increasingly considered as an alternative treatment for a variety of bacterial infections in cases where classical antibiotics have become ineffective. However, characterizing the host specificity of phages remains a labor- and time-intensive process. In order to alleviate this burden, we have developed a new machine-learning-based pipeline to predict bacteriophage hosts based on annotated receptor-binding protein (RBP) sequence data. We focus on predicting bacterial hosts from the ESKAPE group, Escherichia coli, Salmonella enterica and Clostridium difficile. We compare the performance of our predictive model with that of the widely used Basic Local Alignment Search Tool (BLAST). Our best-performing predictive model reaches Precision-Recall Area Under the Curve (PR-AUC) scores between 73.6 and 93.8% for different levels of sequence similarity in the collected data. Our model reaches a performance comparable to that of BLASTp when sequence similarity in the data is high and starts outperforming BLASTp when sequence similarity drops below 75%. Therefore, our machine learning methods can be especially useful in settings in which sequence similarity to other known sequences is low. Predicting the hosts of novel metagenomic RBP sequences could extend our toolbox to tune the host spectrum of phages or phage tail-like bacteriocins by swapping RBPs.

摘要

如今,噬菌体越来越多地被认为是治疗各种细菌感染的一种替代方法,尤其是在经典抗生素已经失效的情况下。然而,描述噬菌体的宿主特异性仍然是一个劳动密集型和时间密集型的过程。为了减轻这种负担,我们开发了一种新的基于机器学习的管道,根据已注释的受体结合蛋白(RBP)序列数据来预测噬菌体的宿主。我们专注于预测 ESKAPE 组、大肠杆菌、沙门氏菌和艰难梭菌的细菌宿主。我们将我们的预测模型的性能与广泛使用的基本局部比对搜索工具(BLAST)进行了比较。我们表现最好的预测模型在收集数据中不同的序列相似性水平上达到了 73.6%到 93.8%的精度-召回率曲线下面积(PR-AUC)得分。当数据中的序列相似性较高时,我们的模型达到了与 BLASTp 相当的性能,而当序列相似性下降到 75%以下时,我们的模型开始优于 BLASTp。因此,我们的机器学习方法在与其他已知序列的序列相似性较低的情况下特别有用。预测新型宏基因组 RBP 序列的宿主可以通过交换 RBPs 来扩展我们的工具包,以调整噬菌体或噬菌体尾状细菌素的宿主谱。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f51c/7809048/8c283293ede8/41598_2021_81063_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验