真核病毒分类学的基因组基础:创建基于序列的病毒科分类框架。
The genomic underpinnings of eukaryotic virus taxonomy: creating a sequence-based framework for family-level virus classification.
机构信息
Nuffield Department of Medicine, University of Oxford, Peter Medawar Building, South Parks Road, Oxford, OX1 3SY, UK.
出版信息
Microbiome. 2018 Feb 20;6(1):38. doi: 10.1186/s40168-018-0422-7.
BACKGROUND
The International Committee on Taxonomy of Viruses (ICTV) classifies viruses into families, genera and species and provides a regulated system for their nomenclature that is universally used in virus descriptions. Virus taxonomic assignments have traditionally been based upon virus phenotypic properties such as host range, virion morphology and replication mechanisms, particularly at family level. However, gene sequence comparisons provide a clearer guide to their evolutionary relationships and provide the only information that may guide the incorporation of viruses detected in environmental (metagenomic) studies that lack any phenotypic data.
RESULTS
The current study sought to determine whether the existing virus taxonomy could be reproduced by examination of genetic relationships through the extraction of protein-coding gene signatures and genome organisational features. We found large-scale consistency between genetic relationships and taxonomic assignments for viruses of all genome configurations and genome sizes. The analysis pipeline that we have called 'Genome Relationships Applied to Virus Taxonomy' (GRAViTy) was highly effective at reproducing the current assignments of viruses at family level as well as inter-family groupings into orders. Its ability to correctly differentiate assigned viruses from unassigned viruses, and classify them into the correct taxonomic group, was evaluated by threefold cross-validation technique. This predicted family membership of eukaryotic viruses with close to 100% accuracy and specificity potentially enabling the algorithm to predict assignments for the vast corpus of metagenomic sequences consistently with ICTV taxonomy rules. In an evaluation run of GRAViTy, over one half (460/921) of (near)-complete genome sequences from several large published metagenomic eukaryotic virus datasets were assigned to 127 novel family-level groupings. If corroborated by other analysis methods, these would potentially more than double the number of eukaryotic virus families in the ICTV taxonomy.
CONCLUSIONS
A rapid and objective means to explore metagenomic viral diversity and make informed recommendations for their assignments at each taxonomic layer is essential. GRAViTy provides one means to make rule-based assignments at family and order levels in a manner that preserves the integrity and underlying organisational principles of the current ICTV taxonomy framework. Such methods are increasingly required as the vast virosphere is explored.
背景
国际病毒分类委员会(ICTV)将病毒分类为科、属和种,并提供了一个普遍用于病毒描述的规范命名系统。病毒分类学的分配传统上基于病毒的表型特性,如宿主范围、病毒形态和复制机制,特别是在科一级。然而,基因序列比较提供了对其进化关系更清晰的指导,并提供了唯一的信息,可以指导将缺乏任何表型数据的环境(宏基因组)研究中检测到的病毒纳入其中。
结果
本研究试图通过提取蛋白编码基因特征和基因组组织特征来检查遗传关系,以确定现有的病毒分类是否可以通过这种方法重现。我们发现,对于所有基因组构型和基因组大小的病毒,遗传关系与分类学分配之间存在大规模的一致性。我们称之为“基因组关系应用于病毒分类学”(GRAViTy)的分析管道在复制科一级以及将科内分组到目中的病毒分类方面非常有效。该方法通过三重复交叉验证技术,评估了其正确区分已分类病毒和未分类病毒并将其分类到正确分类群的能力。该方法预测真核病毒的家族归属准确率接近 100%,特异性高,有可能使该算法能够根据 ICTV 分类规则一致地预测宏基因组序列的分类学归属。在 GRAViTy 的评估运行中,来自几个大型已发表的宏基因组真核病毒数据集的近完整基因组序列中,超过一半(460/921)被分配到 127 个新的科级分组中。如果得到其他分析方法的证实,这将有可能使 ICTV 分类法中的真核病毒科数量增加一倍以上。
结论
快速和客观地探索宏基因组病毒多样性,并在每个分类层为其分配提出明智建议是至关重要的。GRAViTy 提供了一种在家族和目级别上基于规则进行分配的方法,同时保留了 ICTV 分类法框架的完整性和基本组织原则。随着对庞大病毒圈的探索,这种方法的需求越来越大。