FSBI Central Research Institute for Epidemiology of the Federal Service for Surveillance of Consumer Rights Protection and Human Wellbeing, 111123 Moscow, Russia.
Moscow Institute of Physics and Technology, National Research University, 115184 Dolgoprudny, Russia.
Viruses. 2021 Oct 6;13(10):2006. doi: 10.3390/v13102006.
According to various estimates, only a small percentage of existing viruses have been discovered, naturally much less being represented in the genomic databases. High-throughput sequencing technologies develop rapidly, empowering large-scale screening of various biological samples for the presence of pathogen-associated nucleotide sequences, but many organisms are yet to be attributed specific loci for identification. This problem particularly impedes viral screening, due to vast heterogeneity in viral genomes. In this paper, we present a new bioinformatic pipeline, VirIdAl, for detecting and identifying viral pathogens in sequencing data. We also demonstrate the utility of the new software by applying it to viral screening of the feces of bats collected in the Moscow region, which revealed a significant variety of viruses associated with bats, insects, plants, and protozoa. The presence of alpha and beta coronavirus reads, including the MERS-like bat virus, deserves a special mention, as it once again indicates that bats are indeed reservoirs for many viral pathogens. In addition, it was shown that alignment-based methods were unable to identify the taxon for a large proportion of reads, and we additionally applied other approaches, showing that they can further reveal the presence of viral agents in sequencing data. However, the incompleteness of viral databases remains a significant problem in the studies of viral diversity, and therefore necessitates the use of combined approaches, including those based on machine learning methods.
据各种估计,已发现的现有病毒仅占很小一部分,而在基因组数据库中自然代表的就更少了。高通量测序技术发展迅速,能够对各种生物样本进行大规模筛选,以检测与病原体相关的核苷酸序列,但许多生物仍有待确定特定的基因座以进行鉴定。由于病毒基因组的巨大异质性,这个问题尤其阻碍了病毒的筛选。在本文中,我们提出了一种新的生物信息学管道 VirIdAl,用于在测序数据中检测和识别病毒病原体。我们还通过将其应用于在莫斯科地区收集的蝙蝠粪便中的病毒筛选来展示新软件的实用性,该筛选揭示了与蝙蝠、昆虫、植物和原生动物相关的大量病毒。值得特别提及的是,α和β冠状病毒读段的存在,包括类似 MERS 的蝙蝠病毒,这再次表明蝙蝠确实是许多病毒病原体的宿主。此外,研究表明,基于比对的方法无法识别大部分读段的分类群,我们还额外应用了其他方法,表明它们可以进一步揭示测序数据中病毒制剂的存在。然而,病毒数据库的不完整性仍然是病毒多样性研究中的一个重大问题,因此需要使用包括基于机器学习方法在内的综合方法。