PhaTYP:使用 BERT 预测噬菌体的生活方式。

PhaTYP: predicting the lifestyle for bacteriophages using BERT.

机构信息

Department of Electrical Engineering, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong, China SAR.

出版信息

Brief Bioinform. 2023 Jan 19;24(1). doi: 10.1093/bib/bbac487.

Abstract

Bacteriophages (or phages), which infect bacteria, have two distinct lifestyles: virulent and temperate. Predicting the lifestyle of phages helps decipher their interactions with their bacterial hosts, aiding phages' applications in fields such as phage therapy. Because experimental methods for annotating the lifestyle of phages cannot keep pace with the fast accumulation of sequenced phages, computational method for predicting phages' lifestyles has become an attractive alternative. Despite some promising results, computational lifestyle prediction remains difficult because of the limited known annotations and the sheer amount of sequenced phage contigs assembled from metagenomic data. In particular, most of the existing tools cannot precisely predict phages' lifestyles for short contigs. In this work, we develop PhaTYP (Phage TYPe prediction tool) to improve the accuracy of lifestyle prediction on short contigs. We design two different training tasks, self-supervised and fine-tuning tasks, to overcome lifestyle prediction difficulties. We rigorously tested and compared PhaTYP with four state-of-the-art methods: DeePhage, PHACTS, PhagePred and BACPHLIP. The experimental results show that PhaTYP outperforms all these methods and achieves more stable performance on short contigs. In addition, we demonstrated the utility of PhaTYP for analyzing the phage lifestyle on human neonates' gut data. This application shows that PhaTYP is a useful means for studying phages in metagenomic data and helps extend our understanding of microbial communities.

摘要

噬菌体(phages),感染细菌的病毒,具有两种截然不同的生活方式:毒性和温和。预测噬菌体的生活方式有助于破译它们与细菌宿主的相互作用,有助于噬菌体在噬菌体治疗等领域的应用。由于对噬菌体生活方式进行注释的实验方法无法跟上测序噬菌体的快速积累,因此预测噬菌体生活方式的计算方法已成为一种有吸引力的替代方法。尽管取得了一些有希望的结果,但由于已知注释的数量有限以及从宏基因组数据组装的大量测序噬菌体序列,计算性生活方式预测仍然很困难。特别是,大多数现有工具无法准确预测短序列的噬菌体生活方式。在这项工作中,我们开发了 PhaTYP(噬菌体类型预测工具)来提高短序列生活方式预测的准确性。我们设计了两种不同的训练任务,自监督和微调任务,以克服生活方式预测的困难。我们严格测试并比较了 PhaTYP 与四种最先进的方法:DeePhage、PHACTS、PhagePred 和 BACPHLIP。实验结果表明,PhaTYP 优于所有这些方法,并且在短序列上表现出更稳定的性能。此外,我们还展示了 PhaTYP 用于分析人类新生儿肠道数据中噬菌体生活方式的实用性。该应用表明 PhaTYP 是研究宏基因组数据中噬菌体的有用手段,并有助于扩展我们对微生物群落的理解。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a327/9851330/c0dc4103d814/bbac487f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索