• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

病毒基因组的复杂性景观。

The complexity landscape of viral genomes.

机构信息

Institute of Electronics and Informatics Engineering of Aveiro, University of Aveiro, Campus Universitário de Santiago, 3810-193 Aveiro, Portugal.

Department of Electronics Telecommunications and Informatics, University of Aveiro, Campus Universitario de Santiago, 3810-193 Aveiro, Portugal.

出版信息

Gigascience. 2022 Aug 11;11. doi: 10.1093/gigascience/giac079.

DOI:10.1093/gigascience/giac079
PMID:35950839
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9366995/
Abstract

BACKGROUND

Viruses are among the shortest yet highly abundant species that harbor minimal instructions to infect cells, adapt, multiply, and exist. However, with the current substantial availability of viral genome sequences, the scientific repertory lacks a complexity landscape that automatically enlights viral genomes' organization, relation, and fundamental characteristics.

RESULTS

This work provides a comprehensive landscape of the viral genome's complexity (or quantity of information), identifying the most redundant and complex groups regarding their genome sequence while providing their distribution and characteristics at a large and local scale. Moreover, we identify and quantify inverted repeats abundance in viral genomes. For this purpose, we measure the sequence complexity of each available viral genome using data compression, demonstrating that adequate data compressors can efficiently quantify the complexity of viral genome sequences, including subsequences better represented by algorithmic sources (e.g., inverted repeats). Using a state-of-the-art genomic compressor on an extensive viral genomes database, we show that double-stranded DNA viruses are, on average, the most redundant viruses while single-stranded DNA viruses are the least. Contrarily, double-stranded RNA viruses show a lower redundancy relative to single-stranded RNA. Furthermore, we extend the ability of data compressors to quantify local complexity (or information content) in viral genomes using complexity profiles, unprecedently providing a direct complexity analysis of human herpesviruses. We also conceive a features-based classification methodology that can accurately distinguish viral genomes at different taxonomic levels without direct comparisons between sequences. This methodology combines data compression with simple measures such as GC-content percentage and sequence length, followed by machine learning classifiers.

CONCLUSIONS

This article presents methodologies and findings that are highly relevant for understanding the patterns of similarity and singularity between viral groups, opening new frontiers for studying viral genomes' organization while depicting the complexity trends and classification components of these genomes at different taxonomic levels. The whole study is supported by an extensive website (https://asilab.github.io/canvas/) for comprehending the viral genome characterization using dynamic and interactive approaches.

摘要

背景

病毒是拥有最小感染细胞、适应、繁殖和生存指令的最短但高度丰富的物种之一。然而,随着当前大量病毒基因组序列的可用性,科学界缺乏一种自动揭示病毒基因组组织、关系和基本特征的复杂性景观。

结果

这项工作提供了病毒基因组复杂性(或信息量)的综合景观,确定了关于其基因组序列最冗余和最复杂的组,并提供了它们在大尺度和局部尺度上的分布和特征。此外,我们识别和量化了病毒基因组中反转重复的丰度。为此,我们使用数据压缩测量每个可用病毒基因组的序列复杂性,证明了适当的数据压缩器可以有效地量化病毒基因组序列的复杂性,包括算法源(例如反转重复)更好表示的子序列。我们在广泛的病毒基因组数据库上使用最先进的基因组压缩器,表明双链 DNA 病毒平均是最冗余的病毒,而单链 DNA 病毒是最少的。相反,双链 RNA 病毒相对于单链 RNA 显示出较低的冗余性。此外,我们扩展了数据压缩器在病毒基因组中量化局部复杂性(或信息量)的能力,使用复杂性剖面,前所未有地提供了人类疱疹病毒的直接复杂性分析。我们还设想了一种基于特征的分类方法,该方法可以在不进行序列直接比较的情况下,准确地区分不同分类水平的病毒基因组。该方法结合了数据压缩和简单的措施,如 GC 含量百分比和序列长度,然后是机器学习分类器。

结论

本文提出的方法和发现对于理解病毒群体之间的相似性和独特性模式具有重要意义,为研究病毒基因组的组织开辟了新的前沿,同时描绘了这些基因组在不同分类水平上的复杂性趋势和分类组成。整个研究得到了一个广泛的网站(https://asilab.github.io/canvas/)的支持,用于使用动态和交互式方法理解病毒基因组的特征。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8809/9366995/ac1174b05323/giac079fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8809/9366995/947fca46ac08/giac079fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8809/9366995/95901af438e3/giac079fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8809/9366995/b19a82014fbd/giac079fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8809/9366995/0144a825edc7/giac079fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8809/9366995/87af7738d40d/giac079fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8809/9366995/ac1174b05323/giac079fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8809/9366995/947fca46ac08/giac079fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8809/9366995/95901af438e3/giac079fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8809/9366995/b19a82014fbd/giac079fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8809/9366995/0144a825edc7/giac079fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8809/9366995/87af7738d40d/giac079fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8809/9366995/ac1174b05323/giac079fig6.jpg

相似文献

1
The complexity landscape of viral genomes.病毒基因组的复杂性景观。
Gigascience. 2022 Aug 11;11. doi: 10.1093/gigascience/giac079.
2
AlcoR: alignment-free simulation, mapping, and visualization of low-complexity regions in biological data.AlcoR:生物数据中低复杂度区域的无比对模拟、映射和可视化。
Gigascience. 2022 Dec 28;12. doi: 10.1093/gigascience/giad101. Epub 2023 Dec 13.
3
Toward an Alignment-Free Method for Feature Extraction and Accurate Classification of Viral Sequences.
J Comput Biol. 2019 Jun;26(6):519-535. doi: 10.1089/cmb.2018.0239. Epub 2019 May 3.
4
A hybrid pipeline for reconstruction and analysis of viral genomes at multi-organ level.一种用于多器官水平病毒基因组重建和分析的混合管道。
Gigascience. 2020 Aug 1;9(8). doi: 10.1093/gigascience/giaa086.
5
Viral Phylogenomics Using an Alignment-Free Method: A Three-Step Approach to Determine Optimal Length of k-mer.基于无比对方法的病毒系统发生基因组学:确定 k-mer 最优长度的三步法。
Sci Rep. 2017 Jan 19;7:40712. doi: 10.1038/srep40712.
6
Compression rates of microbial genomes are associated with genome size and base composition.微生物基因组的压缩率与基因组大小和碱基组成有关。
Genomics Inform. 2024 Oct 10;22(1):16. doi: 10.1186/s44342-024-00018-z.
7
The analysis of microsatellites and compound microsatellites in 56 complete genomes of Herpesvirales.疱疹病毒目56个完整基因组中的微卫星和复合微卫星分析
Gene. 2014 Nov 1;551(1):103-9. doi: 10.1016/j.gene.2014.08.054. Epub 2014 Aug 27.
8
Large-scale single-virus genomics uncovers hidden diversity of river water viruses and diversified gene profiles.大规模单病毒基因组学揭示了河流水病毒的隐藏多样性和多样化的基因谱。
ISME J. 2024 Jan 8;18(1). doi: 10.1093/ismejo/wrae124.
9
10
VirGen: a comprehensive viral genome resource.VirGen:一个全面的病毒基因组资源库。
Nucleic Acids Res. 2004 Jan 1;32(Database issue):D289-92. doi: 10.1093/nar/gkh098.

引用本文的文献

1
Specificity and mechanism of the double-stranded RNA-specific J2 monoclonal antibody.双链RNA特异性J2单克隆抗体的特异性及作用机制
bioRxiv. 2025 May 10:2025.05.09.649859. doi: 10.1101/2025.05.09.649859.
2
Herpesviruses: overview of systematics, genomic complexity and life cycle.疱疹病毒:系统分类学、基因组复杂性及生命周期概述
Virol J. 2025 May 22;22(1):155. doi: 10.1186/s12985-025-02779-7.
3
Genomic Insights into Neglected Orthobunyaviruses: Molecular Characterization and Phylogenetic Analysis.对被忽视的正布尼亚病毒的基因组洞察:分子特征与系统发育分析

本文引用的文献

1
MBGC: Multiple Bacteria Genome Compressor.MBGC:多细菌基因组压缩器。
Gigascience. 2022 Jan 27;11. doi: 10.1093/gigascience/giab099.
2
A k-mer based approach for classifying viruses without taxonomy identifies viral associations in human autism and plant microbiomes.一种基于k-mer的无分类学病毒分类方法可识别出人类自闭症和植物微生物群落中的病毒关联。
Comput Struct Biotechnol J. 2021 Oct 25;19:5911-5919. doi: 10.1016/j.csbj.2021.10.029. eCollection 2021.
3
Alignment-free sequence comparison for virus genomes based on location correlation coefficient.
Viruses. 2025 Mar 13;17(3):406. doi: 10.3390/v17030406.
4
Temperature modulates dominance of a superinfecting Arctic virus in its unicellular algal host.温度调节了一种在其单细胞藻类宿主中超感染的北极病毒的优势地位。
ISME J. 2024 Jan 8;18(1). doi: 10.1093/ismejo/wrae161.
5
Hecatomb: an integrated software platform for viral metagenomics.Hecatomb:病毒宏基因组学的集成软件平台。
Gigascience. 2024 Jan 2;13. doi: 10.1093/gigascience/giae020.
基于位置相关系数的病毒基因组无比对序列比较。
Infect Genet Evol. 2021 Dec;96:105106. doi: 10.1016/j.meegid.2021.105106. Epub 2021 Oct 6.
4
Extensive C->U transition biases in the genomes of a wide range of mammalian RNA viruses; potential associations with transcriptional mutations, damage- or host-mediated editing of viral RNA.多种哺乳动物RNA病毒基因组中广泛存在的C到U转换偏向性;与转录突变、病毒RNA的损伤介导或宿主介导编辑的潜在关联。
PLoS Pathog. 2021 Jun 1;17(6):e1009596. doi: 10.1371/journal.ppat.1009596. eCollection 2021 Jun.
5
: a New Realm for Archaeal Filamentous Viruses with Linear A-Form Double-Stranded DNA Genomes.具有线性 A 构象双链 DNA 基因组的古菌丝状病毒的新领域。
J Virol. 2021 Jul 12;95(15):e0067321. doi: 10.1128/JVI.00673-21.
6
The Human Bone Marrow Is Host to the DNAs of Several Viruses.人体骨髓是多种病毒的 DNA 宿主。
Front Cell Infect Microbiol. 2021 Apr 22;11:657245. doi: 10.3389/fcimb.2021.657245. eCollection 2021.
7
Analysis of DNA interactions and GC content with energy decomposition in large-scale quantum mechanical calculations.在大规模量子力学计算中分析 DNA 相互作用和 GC 含量与能量分解。
Phys Chem Chem Phys. 2021 Apr 14;23(14):8891-8899. doi: 10.1039/d0cp06630c. Epub 2021 Apr 6.
8
'Multi-SpaM': a maximum-likelihood approach to phylogeny reconstruction using multiple spaced-word matches and quartet trees.“多间隔词匹配法”:一种使用多个间隔词匹配和四重树进行系统发育重建的最大似然法。
NAR Genom Bioinform. 2019 Oct 30;2(1):lqz013. doi: 10.1093/nargab/lqz013. eCollection 2020 Mar.
9
SurVirus: a repeat-aware virus integration caller.SurVirus:一种具有重复识别功能的病毒整合调用器。
Nucleic Acids Res. 2021 Apr 6;49(6):e33. doi: 10.1093/nar/gkaa1237.
10
Sequence Comparison Without Alignment: The SpaM Approaches.无需比对的序列比较:SpaM方法
Methods Mol Biol. 2021;2231:121-134. doi: 10.1007/978-1-0716-1036-7_8.