基于 Vision Transformer 的低覆盖度 SARS-CoV-2 谱系分配

ViTAL: Vision TrAnsformer based Low coverage SARS-CoV-2 lineage assignment.

机构信息

EnICS Labs, Engineering Department, Bar-Ilan University, Ramat Gan, Tel Aviv 5290002, Israel.

出版信息

Bioinformatics. 2024 Mar 4;40(3). doi: 10.1093/bioinformatics/btae093.

Abstract

MOTIVATION

Rapid spread of viral diseases such as Coronavirus disease 2019 (COVID-19) highlights an urgent need for efficient surveillance of virus mutation and transmission dynamics, which requires fast, inexpensive and accurate viral lineage assignment. The first two goals might be achieved through low-coverage whole-genome sequencing (LC-WGS) which enables rapid genome sequencing at scale and at reduced costs. Unfortunately, LC-WGS significantly diminishes the genomic details, rendering accurate lineage assignment very challenging.

RESULTS

We present ViTAL, a novel deep learning algorithm specifically designed to perform lineage assignment of low coverage-sequenced genomes. ViTAL utilizes a combination of MinHash for genomic feature extraction and Vision Transformer for fine-grain genome classification and lineage assignment. We show that ViTAL outperforms state-of-the-art tools across diverse coverage levels, reaching up to 87.7% lineage assignment accuracy at 1× coverage where state-of-the-art tools such as UShER and Kraken2 achieve the accuracy of 5.4% and 27.4% respectively. ViTAL achieves comparable accuracy results with up to 8× lower coverage than state-of-the-art tools. We explore ViTAL's ability to identify the lineages of novel genomes, i.e. genomes the Vision Transformer was not trained on. We show how ViTAL can be applied to preliminary phylogenetic placement of novel variants.

AVAILABILITY AND IMPLEMENTATION

The data underlying this article are available in https://github.com/zuherJahshan/vital and can be accessed with 10.5281/zenodo.10688110.

摘要

动机

冠状病毒病 2019(COVID-19)等病毒疾病的迅速传播突出表明,迫切需要有效地监测病毒突变和传播动态,这需要快速、廉价和准确的病毒谱系分配。前两个目标可以通过低覆盖率全基因组测序(LC-WGS)来实现,LC-WGS 可以大规模以降低的成本进行快速基因组测序。不幸的是,LC-WGS 大大降低了基因组细节,使得准确的谱系分配极具挑战性。

结果

我们提出了 ViTAL,这是一种专门设计用于执行低覆盖率测序基因组谱系分配的新型深度学习算法。ViTAL 结合使用 MinHash 进行基因组特征提取和 Vision Transformer 进行细粒度的基因组分类和谱系分配。我们表明,ViTAL 在不同的覆盖率水平上都优于最先进的工具,在 1×覆盖率下达到了高达 87.7%的谱系分配准确性,而最先进的工具如 UShER 和 Kraken2 的准确性分别为 5.4%和 27.4%。ViTAL 可以在比最先进的工具低 8 倍的覆盖率下达到可比的准确性结果。我们探讨了 ViTAL 识别新型基因组谱系的能力,即 Vision Transformer 未接受过训练的基因组。我们展示了如何将 ViTAL 应用于新型变体的初步系统发育定位。

可用性和实现

本文所依据的数据可在 https://github.com/zuherJahshan/vital 中获得,可通过 10.5281/zenodo.10688110 进行访问。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/64bd/10913383/6230fece5237/btae093f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索