• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用基于标志性蛋白质家族构建的随机森林模型对巨型病毒基因组进行自动分类。

Automated classification of giant virus genomes using a random forest model built on trademark protein families.

作者信息

Ha Anh D, Aylward Frank O

机构信息

Department of Biological Sciences, Virginia Tech, Blacksburg VA, 24061.

Center for Emerging, Zoonotic, and Arthropod-Borne Infectious Disease, Virginia Tech, Blacksburg VA, 24061.

出版信息

bioRxiv. 2023 Nov 13:2023.11.10.566645. doi: 10.1101/2023.11.10.566645.

DOI:10.1101/2023.11.10.566645
PMID:38014039
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10680617/
Abstract

Viruses of the phylum , often referred to as "giant viruses," are prevalent in various environments around the globe and play significant roles in shaping eukaryotic diversity and activities in global ecosystems. Given the extensive phylogenetic diversity within this viral group and the highly complex composition of their genomes, taxonomic classification of giant viruses, particularly incomplete metagenome-assembled genomes (MAGs) can present a considerable challenge. Here we developed TIGTOG (Taxonomic Information of Giant viruses using Trademark Orthologous Groups), a machine learning-based approach to predict the taxonomic classification of novel giant virus MAGs based on profiles of protein family content. We applied a random forest algorithm to a training set of 1,531 quality-checked, phylogenetically diverse genomes using pre-selected sets of giant virus orthologous groups (GVOGs). The classification models were predictive of viral taxonomic assignments with a cross-validation accuracy of 99.6% to the order level and 97.3% to the family level. We found that no individual GVOGs or genome features significantly influenced the algorithm's performance or the models' predictions, indicating that classification predictions were based on a comprehensive genomic signature, which reduced the necessity of a fixed set of marker genes for taxonomic assigning purposes. Our classification models were validated with an independent test set of 823 giant virus genomes with varied genomic completeness and taxonomy and demonstrated an accuracy of 98.6% and 95.9% to the order and family level, respectively. Our results indicate that protein family profiles can be used to accurately classify large DNA viruses at different taxonomic levels and provide a fast and accurate method for the classification of giant viruses. This approach could easily be adapted to other viral groups.

摘要

病毒门的病毒,常被称为“巨型病毒”,在全球各种环境中普遍存在,并在塑造真核生物多样性以及全球生态系统中的活动方面发挥着重要作用。鉴于该病毒群体内广泛的系统发育多样性及其基因组的高度复杂组成,巨型病毒的分类,尤其是不完整的宏基因组组装基因组(MAG)的分类可能是一项相当大的挑战。在此,我们开发了TIGTOG(使用商标直系同源组的巨型病毒分类信息),这是一种基于机器学习的方法,用于根据蛋白质家族含量概况预测新型巨型病毒MAG的分类。我们将随机森林算法应用于一组经过质量检查、系统发育多样的1531个基因组的训练集,使用预先选择的巨型病毒直系同源组(GVOG)。分类模型对病毒分类归属具有预测性,交叉验证准确率在目水平为99.6%,在科水平为97.3%。我们发现,没有单个GVOG或基因组特征会显著影响算法性能或模型预测,这表明分类预测基于综合的基因组特征,从而减少了用于分类目的的固定标记基因集的必要性。我们的分类模型用一组包含823个具有不同基因组完整性和分类的巨型病毒基因组的独立测试集进行了验证,在目水平和科水平的准确率分别为98.6%和95.9%。我们的结果表明,蛋白质家族概况可用于在不同分类水平上准确分类大型DNA病毒,并为巨型病毒的分类提供一种快速准确的方法。这种方法可以很容易地应用于其他病毒群体。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9771/10680617/2311ff075f77/nihpp-2023.11.10.566645v1-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9771/10680617/4684cb14594e/nihpp-2023.11.10.566645v1-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9771/10680617/4c6e25a11689/nihpp-2023.11.10.566645v1-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9771/10680617/d51b95ecec12/nihpp-2023.11.10.566645v1-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9771/10680617/f8c3d63ff325/nihpp-2023.11.10.566645v1-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9771/10680617/2311ff075f77/nihpp-2023.11.10.566645v1-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9771/10680617/4684cb14594e/nihpp-2023.11.10.566645v1-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9771/10680617/4c6e25a11689/nihpp-2023.11.10.566645v1-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9771/10680617/d51b95ecec12/nihpp-2023.11.10.566645v1-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9771/10680617/f8c3d63ff325/nihpp-2023.11.10.566645v1-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9771/10680617/2311ff075f77/nihpp-2023.11.10.566645v1-f0005.jpg

相似文献

1
Automated classification of giant virus genomes using a random forest model built on trademark protein families.使用基于标志性蛋白质家族构建的随机森林模型对巨型病毒基因组进行自动分类。
bioRxiv. 2023 Nov 13:2023.11.10.566645. doi: 10.1101/2023.11.10.566645.
2
Automated classification of giant virus genomes using a random forest model built on trademark protein families.使用基于标志性蛋白质家族构建的随机森林模型对巨型病毒基因组进行自动分类。
Npj Viruses. 2024 Mar 8;2(1):9. doi: 10.1038/s44298-024-00021-9.
3
A phylogenomic framework for charting the diversity and evolution of giant viruses.一个用于描绘巨型病毒多样性和进化的系统发生基因组框架。
PLoS Biol. 2021 Oct 27;19(10):e3001430. doi: 10.1371/journal.pbio.3001430. eCollection 2021 Oct.
4
Conservative taxonomy and quality assessment of giant virus genomes with GVClass.使用GVClass对巨型病毒基因组进行保守分类和质量评估。
Npj Viruses. 2024 Nov 25;2(1):60. doi: 10.1038/s44298-024-00069-7.
5
Mriyaviruses: small relatives of giant viruses.米尔亚病毒:巨型病毒的小型亲戚。
mBio. 2024 Jul 17;15(7):e0103524. doi: 10.1128/mbio.01035-24. Epub 2024 Jun 4.
6
The genomic underpinnings of eukaryotic virus taxonomy: creating a sequence-based framework for family-level virus classification.真核病毒分类学的基因组基础:创建基于序列的病毒科分类框架。
Microbiome. 2018 Feb 20;6(1):38. doi: 10.1186/s40168-018-0422-7.
7
Diversity and genomics of giant viruses in the North Pacific Subtropical Gyre.北太平洋亚热带环流中巨型病毒的多样性与基因组学
Front Microbiol. 2022 Nov 25;13:1021923. doi: 10.3389/fmicb.2022.1021923. eCollection 2022.
8
Widespread Distribution and Evolution of Poxviral Entry-Fusion Complex Proteins in Giant Viruses.痘病毒进入融合复合体蛋白在巨型病毒中的广泛分布与进化
Microbiol Spectr. 2023 Mar 13;11(2):e0494422. doi: 10.1128/spectrum.04944-22.
9
Giant Virus Infection Signatures Are Modulated by Euphotic Zone Depth Strata and Iron Regimes of the Subantarctic Southern Ocean.巨型病毒感染特征受亚热带南大洋透光带深度层和铁态的调节。
mSystems. 2023 Apr 27;8(2):e0126022. doi: 10.1128/msystems.01260-22. Epub 2023 Feb 16.
10
Adaptation strategies of giant viruses to low-temperature marine ecosystems.巨型病毒适应低温海洋生态系统的策略。
ISME J. 2024 Jan 8;18(1). doi: 10.1093/ismejo/wrae162.