Suppr超能文献

利用深度自编码器的异常检测预测 SARS-CoV-2 谱系的优势度。

Forecasting dominance of SARS-CoV-2 lineages by anomaly detection using deep AutoEncoders.

机构信息

Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Via Adolfo Ferrata 5, Pavia, 27100, Italy.

Department of Epidemiology, College of Public Health and Health Professions, University of Florida, 2004 Mowry Road, Gainesville, FL 32610, United States.

出版信息

Brief Bioinform. 2024 Sep 23;25(6). doi: 10.1093/bib/bbae535.

Abstract

The COVID-19 pandemic is marked by the successive emergence of new SARS-CoV-2 variants, lineages, and sublineages that outcompete earlier strains, largely due to factors like increased transmissibility and immune escape. We propose DeepAutoCoV, an unsupervised deep learning anomaly detection system, to predict future dominant lineages (FDLs). We define FDLs as viral (sub)lineages that will constitute >10% of all the viral sequences added to the GISAID, a public database supporting viral genetic sequence sharing, in a given week. DeepAutoCoV is trained and validated by assembling global and country-specific data sets from over 16 million Spike protein sequences sampled over a period of ~4 years. DeepAutoCoV successfully flags FDLs at very low frequencies (0.01%-3%), with median lead times of 4-17 weeks, and predicts FDLs between ~5 and ~25 times better than a baseline approach. For example, the B.1.617.2 vaccine reference strain was flagged as FDL when its frequency was only 0.01%, more than a year before it was considered for an updated COVID-19 vaccine. Furthermore, DeepAutoCoV outputs interpretable results by pinpointing specific mutations potentially linked to increased fitness and may provide significant insights for the optimization of public health 'pre-emptive' intervention strategies.

摘要

COVID-19 大流行的特点是新的 SARS-CoV-2 变体、谱系和亚谱系相继出现,并在很大程度上由于传播能力增强和免疫逃逸等因素而超过早期株。我们提出了 DeepAutoCoV,这是一种无监督的深度学习异常检测系统,用于预测未来的主要谱系(FDL)。我们将 FDL 定义为在特定周内构成 GISAID(支持病毒遗传序列共享的公共数据库)中添加的所有病毒序列的>10%的病毒(亚)谱系。DeepAutoCoV 通过组装来自全球和特定国家的数据集中超过 1600 万段 Spike 蛋白序列,经过约 4 年的时间进行训练和验证。DeepAutoCoV 以非常低的频率(0.01%-3%)成功标记 FDL,中位数提前时间为 4-17 周,并且比基线方法预测 FDL 的能力高出约 5 到 25 倍。例如,当 B.1.617.2 疫苗参考株的频率仅为 0.01%时,它就被标记为 FDL,这比考虑将其更新到 COVID-19 疫苗中早了一年多。此外,DeepAutoCoV 通过指出可能与增加适应性相关的特定突变来提供可解释的结果,并可能为优化公共卫生“先发制人”干预策略提供重要见解。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/235b/11500442/42c77b57b8d7/bbae535f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验