• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于 Vision Transformer 的低覆盖度 SARS-CoV-2 谱系分配

ViTAL: Vision TrAnsformer based Low coverage SARS-CoV-2 lineage assignment.

机构信息

EnICS Labs, Engineering Department, Bar-Ilan University, Ramat Gan, Tel Aviv 5290002, Israel.

出版信息

Bioinformatics. 2024 Mar 4;40(3). doi: 10.1093/bioinformatics/btae093.

DOI:10.1093/bioinformatics/btae093
PMID:38374486
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10913383/
Abstract

MOTIVATION

Rapid spread of viral diseases such as Coronavirus disease 2019 (COVID-19) highlights an urgent need for efficient surveillance of virus mutation and transmission dynamics, which requires fast, inexpensive and accurate viral lineage assignment. The first two goals might be achieved through low-coverage whole-genome sequencing (LC-WGS) which enables rapid genome sequencing at scale and at reduced costs. Unfortunately, LC-WGS significantly diminishes the genomic details, rendering accurate lineage assignment very challenging.

RESULTS

We present ViTAL, a novel deep learning algorithm specifically designed to perform lineage assignment of low coverage-sequenced genomes. ViTAL utilizes a combination of MinHash for genomic feature extraction and Vision Transformer for fine-grain genome classification and lineage assignment. We show that ViTAL outperforms state-of-the-art tools across diverse coverage levels, reaching up to 87.7% lineage assignment accuracy at 1× coverage where state-of-the-art tools such as UShER and Kraken2 achieve the accuracy of 5.4% and 27.4% respectively. ViTAL achieves comparable accuracy results with up to 8× lower coverage than state-of-the-art tools. We explore ViTAL's ability to identify the lineages of novel genomes, i.e. genomes the Vision Transformer was not trained on. We show how ViTAL can be applied to preliminary phylogenetic placement of novel variants.

AVAILABILITY AND IMPLEMENTATION

The data underlying this article are available in https://github.com/zuherJahshan/vital and can be accessed with 10.5281/zenodo.10688110.

摘要

动机

冠状病毒病 2019(COVID-19)等病毒疾病的迅速传播突出表明,迫切需要有效地监测病毒突变和传播动态,这需要快速、廉价和准确的病毒谱系分配。前两个目标可以通过低覆盖率全基因组测序(LC-WGS)来实现,LC-WGS 可以大规模以降低的成本进行快速基因组测序。不幸的是,LC-WGS 大大降低了基因组细节,使得准确的谱系分配极具挑战性。

结果

我们提出了 ViTAL,这是一种专门设计用于执行低覆盖率测序基因组谱系分配的新型深度学习算法。ViTAL 结合使用 MinHash 进行基因组特征提取和 Vision Transformer 进行细粒度的基因组分类和谱系分配。我们表明,ViTAL 在不同的覆盖率水平上都优于最先进的工具,在 1×覆盖率下达到了高达 87.7%的谱系分配准确性,而最先进的工具如 UShER 和 Kraken2 的准确性分别为 5.4%和 27.4%。ViTAL 可以在比最先进的工具低 8 倍的覆盖率下达到可比的准确性结果。我们探讨了 ViTAL 识别新型基因组谱系的能力,即 Vision Transformer 未接受过训练的基因组。我们展示了如何将 ViTAL 应用于新型变体的初步系统发育定位。

可用性和实现

本文所依据的数据可在 https://github.com/zuherJahshan/vital 中获得,可通过 10.5281/zenodo.10688110 进行访问。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/64bd/10913383/46d93ad34edc/btae093f10.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/64bd/10913383/6230fece5237/btae093f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/64bd/10913383/7fbc63ed7ecc/btae093f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/64bd/10913383/3723602fb653/btae093f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/64bd/10913383/1de9af8ca6a5/btae093f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/64bd/10913383/aeea138c6094/btae093f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/64bd/10913383/5162f75ed6ac/btae093f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/64bd/10913383/3372b1c0b367/btae093f7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/64bd/10913383/9c499883325f/btae093f8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/64bd/10913383/168de9b86a8a/btae093f9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/64bd/10913383/46d93ad34edc/btae093f10.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/64bd/10913383/6230fece5237/btae093f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/64bd/10913383/7fbc63ed7ecc/btae093f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/64bd/10913383/3723602fb653/btae093f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/64bd/10913383/1de9af8ca6a5/btae093f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/64bd/10913383/aeea138c6094/btae093f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/64bd/10913383/5162f75ed6ac/btae093f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/64bd/10913383/3372b1c0b367/btae093f7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/64bd/10913383/9c499883325f/btae093f8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/64bd/10913383/168de9b86a8a/btae093f9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/64bd/10913383/46d93ad34edc/btae093f10.jpg

相似文献

1
ViTAL: Vision TrAnsformer based Low coverage SARS-CoV-2 lineage assignment.基于 Vision Transformer 的低覆盖度 SARS-CoV-2 谱系分配
Bioinformatics. 2024 Mar 4;40(3). doi: 10.1093/bioinformatics/btae093.
2
Accurate and fast clade assignment via deep learning and frequency chaos game representation.通过深度学习和频率混沌游戏表示实现准确快速的进化枝分配。
Gigascience. 2022 Dec 28;12. doi: 10.1093/gigascience/giac119.
3
SARS-CoV-2 lineage assignments using phylogenetic placement/UShER are superior to pangoLEARN machine-learning method.使用系统发育定位/UShER 对严重急性呼吸综合征冠状病毒 2(SARS-CoV-2)进行谱系分类优于 pangoLEARN 机器学习方法。
Virus Evol. 2024 Jan 11;10(1):vead085. doi: 10.1093/ve/vead085. eCollection 2024.
4
Pango lineage designation and assignment using SARS-CoV-2 spike gene nucleotide sequences.使用 SARS-CoV-2 刺突基因核苷酸序列对 Pango 谱系进行指定和分配。
BMC Genomics. 2022 Feb 11;23(1):121. doi: 10.1186/s12864-022-08358-2.
5
Vulnerability of pangolin SARS-CoV-2 lineage assignment to adversarial attack.穿山甲 SARS-CoV-2 谱系分配对对抗攻击的脆弱性。
Artif Intell Med. 2023 Dec;146:102722. doi: 10.1016/j.artmed.2023.102722. Epub 2023 Nov 18.
6
A solution to achieve sequencing from SARS-CoV-2 specimens with low viral loads: concatenation of reads from independent reactions.一种从低病毒载量的 SARS-CoV-2 标本中进行测序的解决方案:将来自独立反应的读取片段拼接。
Virol J. 2024 May 30;21(1):121. doi: 10.1186/s12985-024-02347-5.
7
Re-emergence of Gamma-like-II and emergence of Gamma-S:E661D SARS-CoV-2 lineages in the south of Brazil after the 2021 outbreak.巴西南部 2021 年疫情后出现的类似 Γ 型-II 和 Γ-S:E661D SARS-CoV-2 谱系的再次出现。
Virol J. 2021 Nov 17;18(1):222. doi: 10.1186/s12985-021-01690-1.
8
Bioinformatic investigation of discordant sequence data for SARS-CoV-2: insights for robust genomic analysis during pandemic surveillance.新冠病毒 S 基因序列数据的生物信息学研究:大流行监测期间稳健基因组分析的启示。
Microb Genom. 2023 Nov;9(11). doi: 10.1099/mgen.0.001146.
9
A multi-task CNN learning model for taxonomic assignment of human viruses.一种用于人类病毒分类任务的多任务 CNN 学习模型。
BMC Bioinformatics. 2021 Jun 2;22(Suppl 6):194. doi: 10.1186/s12859-021-04084-w.
10
TopHap: rapid inference of key phylogenetic structures from common haplotypes in large genome collections with limited diversity.TopHap:从具有有限多样性的大型基因组集中的常见单倍型中快速推断关键系统发育结构。
Bioinformatics. 2022 May 13;38(10):2719-2726. doi: 10.1093/bioinformatics/btac186.

引用本文的文献

1
VITALdb: to select the best viroinformatics tools for a desired virus or application.VITALdb:为所需病毒或应用选择最佳的病毒信息学工具。
Brief Bioinform. 2025 Mar 4;26(2). doi: 10.1093/bib/bbaf084.

本文引用的文献

1
Identification of SARS-CoV-2 variants using viral sequencing for the Centers for Disease Control and Prevention genomic surveillance program.使用病毒测序对疾病预防控制中心基因组监测计划进行 SARS-CoV-2 变异体的鉴定。
BMC Infect Dis. 2022 Apr 25;22(1):404. doi: 10.1186/s12879-022-07374-7.
2
Genetic Surveillance of Five SARS-CoV-2 Clinical Samples in Henan Province Using Nanopore Sequencing.利用纳米孔测序对河南省 5 例 SARS-CoV-2 临床样本进行遗传监测。
Front Immunol. 2022 Apr 4;13:814806. doi: 10.3389/fimmu.2022.814806. eCollection 2022.
3
A beginner's guide to low-coverage whole genome sequencing for population genomics.
人群基因组学低覆盖度全基因组测序入门指南。
Mol Ecol. 2021 Dec;30(23):5966-5993. doi: 10.1111/mec.16077. Epub 2021 Aug 31.
4
Ultrafast Sample placement on Existing tRees (UShER) enables real-time phylogenetics for the SARS-CoV-2 pandemic.超快现有树木样本放置 (UShER) 可实现 SARS-CoV-2 大流行的实时系统发生学。
Nat Genet. 2021 Jun;53(6):809-816. doi: 10.1038/s41588-021-00862-7. Epub 2021 May 10.
5
PACIFIC: a lightweight deep-learning classifier of SARS-CoV-2 and co-infecting RNA viruses.PACIFIC:一种用于 SARS-CoV-2 及合并感染 RNA 病毒的轻量级深度学习分类器。
Sci Rep. 2021 Feb 5;11(1):3209. doi: 10.1038/s41598-021-82043-4.
6
PBSIM2: a simulator for long-read sequencers with a novel generative model of quality scores.PBSIM2:一种带有新型质量评分生成模型的长读测序模拟软件。
Bioinformatics. 2021 May 5;37(5):589-595. doi: 10.1093/bioinformatics/btaa835.
7
Improved metagenomic analysis with Kraken 2.Kraken 2 提升宏基因组分析。
Genome Biol. 2019 Nov 28;20(1):257. doi: 10.1186/s13059-019-1891-0.
8
A universal SNP and small-indel variant caller using deep neural networks.使用深度神经网络的通用 SNP 和小插入缺失变体调用器。
Nat Biotechnol. 2018 Nov;36(10):983-987. doi: 10.1038/nbt.4235. Epub 2018 Sep 24.
9
NanoSim: nanopore sequence read simulator based on statistical characterization.NanoSim:基于统计特征的纳米孔序列读取模拟器。
Gigascience. 2017 Apr 1;6(4):1-6. doi: 10.1093/gigascience/gix010.
10
Mash: fast genome and metagenome distance estimation using MinHash.Mash:使用MinHash进行快速的基因组和宏基因组距离估计。
Genome Biol. 2016 Jun 20;17(1):132. doi: 10.1186/s13059-016-0997-x.