Phan My V T, Ngo Tri Tue, Hong Anh Pham, Baker Stephen, Kellam Paul, Cotten Matthew
Virus Genomics, Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK.
Department of Viroscience, Erasmus Medical Center, Rotterdam, The Netherlands.
Virus Evol. 2018 Dec 15;4(2):vey035. doi: 10.1093/ve/vey035. eCollection 2018 Jul.
The family of viruses encompasses a group of pathogens with a zoonotic potential as observed from previous outbreaks of the severe acute respiratory syndrome coronavirus and Middle East respiratory syndrome coronavirus. Accordingly, it seems important to identify and document the coronaviruses in animal reservoirs, many of which are uncharacterized and potentially missed by more standard diagnostic assays. A combination of sensitive deep sequencing technology and computational algorithms is essential for virus surveillance, especially for characterizing novel- or distantly related virus strains. Here, we explore the use of profile Hidden Markov Model-defined Pfam protein domains (Pfam domains) encoded by new sequences as a sequence classification tool. The encoded domains are used first in a triage to identify potential sequences and then processed using a Random Forest method to classify the sequences to the genus level. The application of this algorithm on genomes assembled from agnostic deep sequencing data from surveillance of bats and rats in Dong Thap province (Vietnam) identified thirty-four and eleven genomes. This collection of bat and rat coronaviruses genomes provided essential information on the local diversity of coronaviruses and substantially expanded the number of coronavirus full genomes available from bat and rats and may facilitate further molecular studies on this group of viruses.
从先前严重急性呼吸综合征冠状病毒和中东呼吸综合征冠状病毒的爆发情况来看,该病毒家族包含了一群具有人畜共患病潜力的病原体。因此,识别并记录动物宿主中的冠状病毒似乎很重要,其中许多冠状病毒尚未得到鉴定,更标准的诊断检测方法可能会遗漏它们。灵敏的深度测序技术和计算算法相结合对于病毒监测至关重要,特别是对于鉴定新型或远缘相关的病毒株。在这里,我们探索使用由新序列编码的轮廓隐马尔可夫模型定义的Pfam蛋白结构域(Pfam结构域)作为序列分类工具。首先在分类筛选中使用编码的结构域来识别潜在序列,然后使用随机森林方法进行处理,将序列分类到属水平。该算法应用于从越南同塔省蝙蝠和大鼠监测的未知深度测序数据组装的基因组,鉴定出了34个蝙蝠冠状病毒基因组和11个大鼠冠状病毒基因组。这组蝙蝠和大鼠冠状病毒基因组提供了有关冠状病毒局部多样性的重要信息,并大幅增加了可获得的蝙蝠和大鼠冠状病毒全基因组数量,可能有助于对这组病毒进行进一步的分子研究。