Organisms and Environment Division, School of Biosciences, Cardiff University, Cardiff CF10 3AX, UK.
Public Health Wales, University Hospital of Wales, Cardiff CF14 4XW, UK.
Bioinformatics. 2020 Mar 1;36(6):1681-1688. doi: 10.1093/bioinformatics/btz814.
Influenza viruses represent a global public health burden due to annual epidemics and pandemic potential. Due to a rapidly evolving RNA genome, inter-species transmission, intra-host variation, and noise in short-read data, reads can be lost during mapping, and de novo assembly can be time consuming and result in misassembly. We assessed read loss during mapping and designed a graph-based classifier, VAPOR, for selecting mapping references, assembly validation and detection of strains of non-human origin.
Standard human reference viruses were insufficient for mapping diverse influenza samples in simulation. VAPOR retrieved references for 257 real whole-genome sequencing samples with a mean of >99.8% identity to assemblies, and increased the proportion of mapped reads by up to 13.3% compared to standard references. VAPOR has the potential to improve the robustness of bioinformatics pipelines for surveillance and could be adapted to other RNA viruses.
VAPOR is available at https://github.com/connor-lab/vapor.
Supplementary data are available at Bioinformatics online.
流感病毒由于每年的流行和大流行的潜力,对全球公共卫生构成了负担。由于 RNA 基因组的快速进化、种间传播、宿主内变异以及短读长数据中的噪声,在映射过程中可能会丢失读取,从头组装可能既耗时又容易导致组装错误。我们评估了映射过程中的读取丢失,并设计了基于图的分类器 VAPOR,用于选择映射参考、组装验证和检测非人类来源的菌株。
标准的人类参考病毒在模拟中不足以映射多样化的流感样本。VAPOR 检索了 257 个真实全基因组测序样本的参考,与组装体的平均相似度>99.8%,与标准参考相比,增加了多达 13.3%的映射读取比例。VAPOR 有可能提高监测的生物信息学管道的稳健性,并且可以适应其他 RNA 病毒。
VAPOR 可在 https://github.com/connor-lab/vapor 上获得。
补充数据可在“Bioinformatics”在线获得。