Suppr超能文献

基于混沌游戏表示和 Vision Transformer 的噬菌体衣壳蛋白分类

PhaVIP: Phage VIrion Protein classification based on chaos game representation and Vision Transformer.

机构信息

Department of Electrical Engineering, City University of Hong Kong, Hong Kong (SAR), China.

出版信息

Bioinformatics. 2023 Jun 30;39(39 Suppl 1):i30-i39. doi: 10.1093/bioinformatics/btad229.

Abstract

MOTIVATION

As viruses that mainly infect bacteria, phages are key players across a wide range of ecosystems. Analyzing phage proteins is indispensable for understanding phages' functions and roles in microbiomes. High-throughput sequencing enables us to obtain phages in different microbiomes with low cost. However, compared to the fast accumulation of newly identified phages, phage protein classification remains difficult. In particular, a fundamental need is to annotate virion proteins, the structural proteins, such as major tail, baseplate, etc. Although there are experimental methods for virion protein identification, they are too expensive or time-consuming, leaving a large number of proteins unclassified. Thus, there is a great demand to develop a computational method for fast and accurate phage virion protein (PVP) classification.

RESULTS

In this work, we adapted the state-of-the-art image classification model, Vision Transformer, to conduct virion protein classification. By encoding protein sequences into unique images using chaos game representation, we can leverage Vision Transformer to learn both local and global features from sequence "images". Our method, PhaVIP, has two main functions: classifying PVP and non-PVP sequences and annotating the types of PVP, such as capsid and tail. We tested PhaVIP on several datasets with increasing difficulty and benchmarked it against alternative tools. The experimental results show that PhaVIP has superior performance. After validating the performance of PhaVIP, we investigated two applications that can use the output of PhaVIP: phage taxonomy classification and phage host prediction. The results showed the benefit of using classified proteins over all proteins.

AVAILABILITY AND IMPLEMENTATION

The web server of PhaVIP is available via: https://phage.ee.cityu.edu.hk/phavip. The source code of PhaVIP is available via: https://github.com/KennthShang/PhaVIP.

摘要

动机

作为主要感染细菌的病毒,噬菌体在广泛的生态系统中扮演着关键角色。分析噬菌体蛋白对于理解噬菌体在微生物组中的功能和作用是不可或缺的。高通量测序使我们能够以低成本获得不同微生物组中的噬菌体。然而,与新鉴定的噬菌体的快速积累相比,噬菌体蛋白分类仍然具有挑战性。特别是,需要对衣壳蛋白(结构蛋白,如主要尾部、基板等)进行注释。虽然有用于衣壳蛋白鉴定的实验方法,但它们过于昂贵或耗时,导致大量蛋白质未分类。因此,开发一种快速准确的噬菌体衣壳蛋白(PVP)分类计算方法具有很大的需求。

结果

在这项工作中,我们采用了最先进的图像分类模型 Vision Transformer 来进行衣壳蛋白分类。通过使用混沌游戏表示将蛋白质序列编码成独特的图像,我们可以利用 Vision Transformer 从序列“图像”中学习局部和全局特征。我们的方法 PhaVIP 有两个主要功能:分类 PVP 和非 PVP 序列以及注释 PVP 的类型,如衣壳和尾部。我们在几个数据集上测试了 PhaVIP,并与替代工具进行了基准测试。实验结果表明 PhaVIP 具有优越的性能。在验证了 PhaVIP 的性能之后,我们研究了两个可以利用 PhaVIP 输出的应用:噬菌体分类和噬菌体宿主预测。结果表明,使用分类后的蛋白质比使用所有蛋白质更有益。

可用性和实现

PhaVIP 的网络服务器可通过:https://phage.ee.cityu.edu.hk/phavip 访问。PhaVIP 的源代码可通过:https://github.com/KennthShang/PhaVIP 访问。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7a36/10311294/dd74bf3bbd99/btad229f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验