• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用基于标志性蛋白质家族构建的随机森林模型对巨型病毒基因组进行自动分类。

Automated classification of giant virus genomes using a random forest model built on trademark protein families.

作者信息

Ha Anh D, Aylward Frank O

机构信息

Department of Biological Sciences, Virginia Tech, Blacksburg, VA, 24061, USA.

Center for Emerging, Zoonotic, and Arthropod-Borne Infectious Disease, Virginia Tech, Blacksburg, VA, 24061, USA.

出版信息

Npj Viruses. 2024 Mar 8;2(1):9. doi: 10.1038/s44298-024-00021-9.

DOI:10.1038/s44298-024-00021-9
PMID:40295679
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11721082/
Abstract

Viruses of the phylum Nucleocytoviricota, often referred to as "giant viruses," are prevalent in various environments around the globe and play significant roles in shaping eukaryotic diversity and activities in global ecosystems. Given the extensive phylogenetic diversity within this viral group and the highly complex composition of their genomes, taxonomic classification of giant viruses, particularly incomplete metagenome-assembled genomes (MAGs) can present a considerable challenge. Here we developed TIGTOG (Taxonomic Information of Giant viruses using Trademark Orthologous Groups), a machine learning-based approach to predict the taxonomic classification of novel giant virus MAGs based on profiles of protein family content. We applied a random forest algorithm to a training set of 1531 quality-checked, phylogenetically diverse Nucleocytoviricota genomes using pre-selected sets of giant virus orthologous groups (GVOGs). The classification models were predictive of viral taxonomic assignments with a cross-validation accuracy of 99.6% at the order level and 97.3% at the family level. We found that no individual GVOGs or genome features significantly influenced the algorithm's performance or the models' predictions, indicating that classification predictions were based on a comprehensive genomic signature, which reduced the necessity of a fixed set of marker genes for taxonomic assigning purposes. Our classification models were validated with an independent test set of 823 giant virus genomes with varied genomic completeness and taxonomy and demonstrated an accuracy of 98.6% and 95.9% at the order and family level, respectively. Our results indicate that protein family profiles can be used to accurately classify large DNA viruses at different taxonomic levels and provide a fast and accurate method for the classification of giant viruses. This approach could easily be adapted to other viral groups.

摘要

核质巨DNA病毒门的病毒,通常被称为“巨型病毒”,在全球各种环境中普遍存在,并且在塑造真核生物多样性以及全球生态系统中的活动方面发挥着重要作用。鉴于该病毒群体内广泛的系统发育多样性及其基因组的高度复杂组成,巨型病毒的分类,尤其是不完整的宏基因组组装基因组(MAG)的分类可能是一项相当大的挑战。在此,我们开发了TIGTOG(使用商标直系同源组的巨型病毒分类信息),这是一种基于机器学习的方法,用于根据蛋白质家族含量概况预测新型巨型病毒MAG的分类。我们将随机森林算法应用于一组1531个经过质量检查、系统发育多样的核质巨DNA病毒门基因组的训练集,使用预先选择的巨型病毒直系同源组(GVOG)。分类模型对病毒分类分配具有预测性,在目水平上交叉验证准确率为99.6%,在科水平上为97.3%。我们发现没有单个GVOG或基因组特征会显著影响算法性能或模型预测,这表明分类预测基于综合的基因组特征,从而减少了为分类目的而设置固定一组标记基因的必要性。我们的分类模型通过一组823个具有不同基因组完整性和分类的巨型病毒基因组的独立测试集进行了验证,在目和科水平上的准确率分别为98.6%和95.9%。我们的结果表明,蛋白质家族概况可用于在不同分类水平上准确分类大型DNA病毒,并为巨型病毒的分类提供了一种快速准确的方法。这种方法可以很容易地应用于其他病毒群体。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c659/11721082/0b8086cdee91/44298_2024_21_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c659/11721082/4793168d8093/44298_2024_21_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c659/11721082/2048fb17bb10/44298_2024_21_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c659/11721082/d1cce833921b/44298_2024_21_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c659/11721082/8413732c549f/44298_2024_21_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c659/11721082/0b8086cdee91/44298_2024_21_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c659/11721082/4793168d8093/44298_2024_21_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c659/11721082/2048fb17bb10/44298_2024_21_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c659/11721082/d1cce833921b/44298_2024_21_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c659/11721082/8413732c549f/44298_2024_21_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c659/11721082/0b8086cdee91/44298_2024_21_Fig5_HTML.jpg

相似文献

1
Automated classification of giant virus genomes using a random forest model built on trademark protein families.使用基于标志性蛋白质家族构建的随机森林模型对巨型病毒基因组进行自动分类。
Npj Viruses. 2024 Mar 8;2(1):9. doi: 10.1038/s44298-024-00021-9.
2
Automated classification of giant virus genomes using a random forest model built on trademark protein families.使用基于标志性蛋白质家族构建的随机森林模型对巨型病毒基因组进行自动分类。
bioRxiv. 2023 Nov 13:2023.11.10.566645. doi: 10.1101/2023.11.10.566645.
3
Conservative taxonomy and quality assessment of giant virus genomes with GVClass.使用GVClass对巨型病毒基因组进行保守分类和质量评估。
Npj Viruses. 2024 Nov 25;2(1):60. doi: 10.1038/s44298-024-00069-7.
4
A phylogenomic framework for charting the diversity and evolution of giant viruses.一个用于描绘巨型病毒多样性和进化的系统发生基因组框架。
PLoS Biol. 2021 Oct 27;19(10):e3001430. doi: 10.1371/journal.pbio.3001430. eCollection 2021 Oct.
5
Mriyaviruses: small relatives of giant viruses.米尔亚病毒:巨型病毒的小型亲戚。
mBio. 2024 Jul 17;15(7):e0103524. doi: 10.1128/mbio.01035-24. Epub 2024 Jun 4.
6
The genomic underpinnings of eukaryotic virus taxonomy: creating a sequence-based framework for family-level virus classification.真核病毒分类学的基因组基础:创建基于序列的病毒科分类框架。
Microbiome. 2018 Feb 20;6(1):38. doi: 10.1186/s40168-018-0422-7.
7
Diversity and genomics of giant viruses in the North Pacific Subtropical Gyre.北太平洋亚热带环流中巨型病毒的多样性与基因组学
Front Microbiol. 2022 Nov 25;13:1021923. doi: 10.3389/fmicb.2022.1021923. eCollection 2022.
8
Giant Virus Infection Signatures Are Modulated by Euphotic Zone Depth Strata and Iron Regimes of the Subantarctic Southern Ocean.巨型病毒感染特征受亚热带南大洋透光带深度层和铁态的调节。
mSystems. 2023 Apr 27;8(2):e0126022. doi: 10.1128/msystems.01260-22. Epub 2023 Feb 16.
9
Adaptation strategies of giant viruses to low-temperature marine ecosystems.巨型病毒适应低温海洋生态系统的策略。
ISME J. 2024 Jan 8;18(1). doi: 10.1093/ismejo/wrae162.
10
Widespread Distribution and Evolution of Poxviral Entry-Fusion Complex Proteins in Giant Viruses.痘病毒进入融合复合体蛋白在巨型病毒中的广泛分布与进化
Microbiol Spectr. 2023 Mar 13;11(2):e0494422. doi: 10.1128/spectrum.04944-22.

引用本文的文献

1
Bidirectional subsethood of shared marker profiles enables accurate virus classification.共享标记谱的双向子集关系可实现准确的病毒分类。
Microbiome. 2025 Jul 24;13(1):170. doi: 10.1186/s40168-025-02159-x.
2
A deep dive into giant viruses.深入探究巨型病毒。
Npj Viruses. 2025 May 31;3(1):48. doi: 10.1038/s44298-025-00131-y.
3
Vertical transport and spatiotemporal dynamics of giant viruses in the North Pacific subtropical gyre.北太平洋亚热带环流中巨型病毒的垂直传输与时空动态

本文引用的文献

1
Taxonomic update for giant viruses in the order Imitervirales (phylum Nucleocytoviricota).分类学更新:拟病毒目(核质网病毒门)中的巨型病毒。
Arch Virol. 2023 Oct 31;168(11):283. doi: 10.1007/s00705-023-05906-3.
2
Virologs, viral mimicry, and virocell metabolism: the expanding scale of cellular functions encoded in the complex genomes of giant viruses.病毒学家、病毒模拟和病毒细胞代谢:编码在巨型病毒复杂基因组中的细胞功能不断扩大。
FEMS Microbiol Rev. 2023 Sep 5;47(5). doi: 10.1093/femsre/fuad053.
3
ConCreT, a 2D convolutional neural network for taxonomic classification applied to viruses in the phylum Cressdnaviricota.
ISME J. 2025 Jan 2;19(1). doi: 10.1093/ismejo/wraf094.
ConCreT,一种应用于 Cressdnaviricota 门病毒的二维卷积神经网络,用于分类学分类。
J Virol Methods. 2023 Oct;320:114789. doi: 10.1016/j.jviromet.2023.114789. Epub 2023 Aug 2.
4
Assessing the biogeography of marine giant viruses in four oceanic transects.评估四个大洋断面中海洋巨型病毒的生物地理学。
ISME Commun. 2023 Apr 29;3(1):43. doi: 10.1038/s43705-023-00252-6.
5
Mirusviruses link herpesviruses to giant viruses.Mirusviruses 将疱疹病毒与巨型病毒联系起来。
Nature. 2023 Apr;616(7958):783-789. doi: 10.1038/s41586-023-05962-4. Epub 2023 Apr 19.
6
INfrastructure for a PHAge REference Database: Identification of Large-Scale Biases in the Current Collection of Cultured Phage Genomes.噬菌体参考数据库的基础设施:识别当前培养噬菌体基因组集合中的大规模偏差
Phage (New Rochelle). 2021 Dec 1;2(4):214-223. doi: 10.1089/phage.2021.0007. Epub 2021 Dec 16.
7
VirusTaxo: Taxonomic classification of viruses from the genome sequence using k-mer enrichment.病毒分类数据库:基于 k -mer 富集的基因组序列对病毒进行分类学鉴定。
Genomics. 2022 Jul;114(4):110414. doi: 10.1016/j.ygeno.2022.110414. Epub 2022 Jun 17.
8
Machine Learning Approach for Autonomous Detection and Classification of COVID-19 Virus.用于新冠病毒自主检测与分类的机器学习方法
Comput Electr Eng. 2022 Jul;101:108055. doi: 10.1016/j.compeleceng.2022.108055. Epub 2022 Apr 29.
9
Infection strategy and biogeography distinguish cosmopolitan groups of marine jumbo bacteriophages.感染策略和生物地理学区分了海洋巨型噬菌体的世界性群体。
ISME J. 2022 Jun;16(6):1657-1667. doi: 10.1038/s41396-022-01214-x. Epub 2022 Mar 8.
10
A phylogenomic framework for charting the diversity and evolution of giant viruses.一个用于描绘巨型病毒多样性和进化的系统发生基因组框架。
PLoS Biol. 2021 Oct 27;19(10):e3001430. doi: 10.1371/journal.pbio.3001430. eCollection 2021 Oct.