PHROG：利用远缘同源性聚类的原核病毒蛋白家族。

PHROG: families of prokaryotic virus proteins clustered using remote homology.

作者信息

Terzian Paul, Olo Ndela Eric, Galiez Clovis, Lossouarn Julien, Pérez Bucio Rubén Enrique, Mom Robin, Toussaint Ariane, Petit Marie-Agnès, Enault François

机构信息

Université Clermont Auvergne, CNRS, LMGE, F-63000 Clermont-Ferrand, France.

Univ. Grenoble Alpes, CNRS, Grenoble INP, LJK, 38000 Grenoble, France.

出版信息

NAR Genom Bioinform. 2021 Aug 5;3(3):lqab067. doi: 10.1093/nargab/lqab067. eCollection 2021 Sep.

DOI:10.1093/nargab/lqab067

PMID:34377978

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8341000/

Abstract

Viruses are abundant, diverse and ancestral biological entities. Their diversity is high, both in terms of the number of different protein families encountered and in the sequence heterogeneity of each protein family. The recent increase in sequenced viral genomes constitutes a great opportunity to gain new insights into this diversity and consequently urges the development of annotation resources to help functional and comparative analysis. Here, we introduce PHROG (Prokaryotic Virus Remote Homologous Groups), a library of viral protein families generated using a new clustering approach based on remote homology detection by HMM profile-profile comparisons. Considering 17 473 reference (pro)viruses of prokaryotes, 868 340 of the total 938 864 proteins were grouped into 38 880 clusters that proved to be a 2-fold deeper clustering than using a classical strategy based on BLAST-like similarity searches, and yet to remain homogeneous. Manual inspection of similarities to various reference sequence databases led to the annotation of 5108 clusters (containing 50.6 % of the total protein dataset) with 705 different annotation terms, included in 9 functional categories, specifically designed for viruses. Hopefully, PHROG will be a useful tool to better annotate future prokaryotic viral sequences thus helping the scientific community to better understand the evolution and ecology of these entities.

摘要

病毒是丰富多样且古老的生物实体。它们的多样性很高，无论是在遇到的不同蛋白质家族数量方面，还是在每个蛋白质家族的序列异质性方面。最近测序的病毒基因组数量增加，为深入了解这种多样性提供了绝佳机会，因此迫切需要开发注释资源以辅助功能和比较分析。在此，我们介绍PHROG（原核生物病毒远程同源组），这是一个病毒蛋白质家族库，它采用了一种基于HMM profile-profile比较进行远程同源性检测的新聚类方法生成。考虑到17473个原核生物的参考（原）病毒，在总共938864个蛋白质中，有868340个被分组到38880个簇中，事实证明，与基于类似BLAST相似性搜索的经典策略相比，这种聚类深度提高了两倍，并且仍然保持同质性。通过人工检查与各种参考序列数据库的相似性，使用705个不同的注释术语对5108个簇（包含总蛋白质数据集的50.6%）进行了注释，这些术语包含在9个专门为病毒设计的功能类别中。有望PHROG将成为一个有用的工具，用于更好地注释未来的原核生物病毒序列，从而帮助科学界更好地理解这些实体的进化和生态。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/24d8/8341000/05ba378050d5/lqab067fig1.jpg

相似文献

PHROG: families of prokaryotic virus proteins clustered using remote homology.

NAR Genom Bioinform. 2021 Aug 5;3(3):lqab067. doi: 10.1093/nargab/lqab067. eCollection 2021 Sep.

VOGDB-Database of Virus Orthologous Groups.

Viruses. 2024 Jul 25;16(8):1191. doi: 10.3390/v16081191.

Prokaryotic Virus Orthologous Groups (pVOGs): a resource for comparative genomics and protein family annotation.

Nucleic Acids Res. 2017 Jan 4;45(D1):D491-D498. doi: 10.1093/nar/gkw975. Epub 2016 Oct 26.

Filling-in void and sparse regions in protein sequence space by protein-like artificial sequences enables remarkable enhancement in remote homology detection capability.

J Mol Biol. 2014 Feb 20;426(4):962-79. doi: 10.1016/j.jmb.2013.11.026. Epub 2013 Dec 4.

VirClust-A Tool for Hierarchical Clustering, Core Protein Detection and Annotation of () Viruses.

Viruses. 2023 Apr 19;15(4):1007. doi: 10.3390/v15041007.

Remote homology and the functions of metagenomic dark matter.

Front Genet. 2015 Jul 21;6:234. doi: 10.3389/fgene.2015.00234. eCollection 2015.

A De-Novo Genome Analysis Pipeline (DeNoGAP) for large-scale comparative prokaryotic genomics studies.

BMC Bioinformatics. 2016 Jun 30;17(1):260. doi: 10.1186/s12859-016-1142-2.

Large language models improve annotation of prokaryotic viral proteins.

Nat Microbiol. 2024 Feb;9(2):537-549. doi: 10.1038/s41564-023-01584-8. Epub 2024 Jan 29.

[Comprehensive re-annotation of protein-coding genes for prokaryotic genomes by Z-curve and similarity-based methods].

Yi Chuan. 2020 Jul 20;42(7):691-702. doi: 10.16288/j.yczz.20-022.

引用本文的文献

Phage quest: a beginner's guide to explore viral diversity in the prokaryotic world.

Brief Bioinform. 2025 Aug 31;26(5). doi: 10.1093/bib/bbaf449.

Phage Host Range Expansion Through Directed Evolution on Highly Phage-Resistant Strains of .

Int J Mol Sci. 2025 Aug 6;26(15):7597. doi: 10.3390/ijms26157597.

Fold first, ask later: structure-informed function annotation of phage proteins.

bioRxiv. 2025 Jul 20:2025.07.17.665397. doi: 10.1101/2025.07.17.665397.

Therapeutic application of a jumbo bacteriophage against metallo-β-lactamase producing Pseudomonas aeruginosa clinical isolates.

J Biomed Sci. 2025 Aug 11;32(1):74. doi: 10.1186/s12929-025-01169-z.

Single cell viral tagging of reveals rare bacteriophages omitted by other techniques.

Gut Microbes. 2025 Dec;17(1):2526719. doi: 10.1080/19490976.2025.2526719. Epub 2025 Aug 3.

Revisiting phage tail spike architecture: evidence for undetected receptor-binding proteins in with non-contractile tails.

Front Microbiol. 2025 Jul 16;16:1625765. doi: 10.3389/fmicb.2025.1625765. eCollection 2025.

The prototypic crAssphage is a linear phage-plasmid.

Cell Host Microbe. 2025 Aug 13;33(8):1347-1362.e5. doi: 10.1016/j.chom.2025.07.004. Epub 2025 Jul 28.

Identification and profiling of novel metagenome assembled uncultivated virus genomes from human gut.

Virol J. 2025 Jul 25;22(1):254. doi: 10.1186/s12985-025-02739-1.

Genome sequence of staphylococcal phage ESa4 of the genus .

Microbiol Resour Announc. 2025 Aug 14;14(8):e0028025. doi: 10.1128/mra.00280-25. Epub 2025 Jul 25.

Complementary killing activities of and phages on planktonic and sessile PAO1 derivatives.

Antimicrob Agents Chemother. 2025 Sep 3;69(9):e0057925. doi: 10.1128/aac.00579-25. Epub 2025 Jul 23.

本文引用的文献

HH-suite3 for fast remote homology detection and deep protein annotation.

BMC Bioinformatics. 2019 Sep 14;20(1):473. doi: 10.1186/s12859-019-3019-7.

Mechanism of DNA End Sensing and Processing by the Mre11-Rad50 Complex.

Mol Cell. 2019 Nov 7;76(3):382-394.e6. doi: 10.1016/j.molcel.2019.07.035. Epub 2019 Sep 3.

Marine DNA Viral Macro- and Microdiversity from Pole to Pole.

Cell. 2019 May 16;177(5):1109-1123.e14. doi: 10.1016/j.cell.2019.03.040. Epub 2019 Apr 25.

Structure and mechanism of the Red recombination system of bacteriophage λ.

Prog Biophys Mol Biol. 2019 Oct;147:33-46. doi: 10.1016/j.pbiomolbio.2019.03.005. Epub 2019 Mar 21.

Minimum Information about an Uncultivated Virus Genome (MIUViG).

Nat Biotechnol. 2019 Jan;37(1):29-37. doi: 10.1038/nbt.4306. Epub 2018 Dec 17.

eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses.

Nucleic Acids Res. 2019 Jan 8;47(D1):D309-D314. doi: 10.1093/nar/gky1085.

Sak4 of Phage HK620 Is a RecA Remote Homolog With Single-Strand Annealing Activity Stimulated by Its Cognate SSB Protein.

Front Microbiol. 2018 Apr 24;9:743. doi: 10.3389/fmicb.2018.00743. eCollection 2018.

MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets.

Nat Biotechnol. 2017 Nov;35(11):1026-1028. doi: 10.1038/nbt.3988. Epub 2017 Oct 16.

vConTACT: an iVirus tool to classify double-stranded DNA viruses that infect and .

PeerJ. 2017 May 3;5:e3243. doi: 10.7717/peerj.3243. eCollection 2017.

Uniclust databases of clustered and deeply annotated protein sequences and alignments.

Nucleic Acids Res. 2017 Jan 4;45(D1):D170-D176. doi: 10.1093/nar/gkw1081. Epub 2016 Nov 28.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

PHROG：利用远缘同源性聚类的原核病毒蛋白家族。

PHROG: families of prokaryotic virus proteins clustered using remote homology.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献