通过预先训练的 CNN 对 SARS-CoV-2 序列进行分类，确定 Spike 上与重组特征相关的数学特征的可解释性。

Classification of SARS-CoV-2 sequences as recombinants via a pre-trained CNN and identification of a mathematical signature relative to recombinant feature at Spike, via interpretability.

机构信息

Faculty of Engineering, University of Deusto, Bilbao, Biscay, Spain.

National Microbiology Center (NMC), Instituto de Salud Carlos III (ISCIII), Majadahonda, Madrid, Spain.

出版信息

PLoS One. 2024 Aug 26;19(8):e0309391. doi: 10.1371/journal.pone.0309391. eCollection 2024.

DOI:10.1371/journal.pone.0309391

PMID:39186542

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11346643/

Abstract

The global impact of the SARS-CoV-2 pandemic has underscored the need for a deeper understanding of viral evolution to anticipate new viruses or variants. Genetic recombination is a fundamental mechanism in viral evolution, yet it remains poorly understood. In this study, we conducted a comprehensive research on the genetic regions associated with genetic recombination features in SARS-CoV-2. With this aim, we implemented a two-phase transfer learning approach using genomic spectrograms of complete SARS-CoV-2 sequences. In the first phase, we utilized a pre-trained VGG-16 model with genomic spectrograms of HIV-1, and in the second phase, we applied HIV-1 VGG-16 model to SARS-CoV-2 spectrograms. The identification of key recombination hot zones was achieved using the Grad-CAM interpretability tool, and the results were analyzed by mathematical and image processing techniques. Our findings unequivocally identify the SARS-CoV-2 Spike protein (S protein) as the pivotal region in the genetic recombination feature. For non-recombinant sequences, the relevant frequencies clustered around 1/6 and 1/12. In recombinant sequences, the sharp prominence of the main hot zone in the Spike protein prominently indicated a frequency of 1/6. These findings suggest that in the arithmetic series, every 6 nucleotides (two triplets) in S may encode crucial information, potentially concealing essential details about viral characteristics, in this case, recombinant feature of a SARS-CoV-2 genetic sequence. This insight further underscores the potential presence of multifaceted information within the genome, including mathematical signatures that define an organism's unique attributes.

摘要

SARS-CoV-2 大流行的全球影响突显了加深对病毒进化的理解以预测新病毒或变体的必要性。基因重组是病毒进化的基本机制，但人们对此仍知之甚少。在这项研究中，我们对与 SARS-CoV-2 遗传重组特征相关的遗传区域进行了全面研究。为此，我们采用了一种两阶段迁移学习方法，使用了完整的 SARS-CoV-2 序列的基因组光谱图。在第一阶段，我们使用带有 HIV-1 基因组光谱图的预训练 VGG-16 模型，在第二阶段，我们将 HIV-1 VGG-16 模型应用于 SARS-CoV-2 光谱图。使用 Grad-CAM 可解释性工具识别了关键重组热点区，然后通过数学和图像处理技术对结果进行了分析。我们的发现明确地将 SARS-CoV-2 刺突蛋白（S 蛋白）鉴定为遗传重组特征的关键区域。对于非重组序列，相关频率聚集在 1/6 和 1/12 附近。在重组序列中，S 蛋白中主要热点区的明显突出表明频率为 1/6。这些发现表明，在算术级数中，S 中的每 6 个核苷酸（三个三联体）可能编码重要信息，可能隐藏了有关病毒特征的重要细节，在这种情况下，SARS-CoV-2 遗传序列的重组特征。这一发现进一步强调了基因组中可能存在多方面的信息，包括定义生物体独特属性的数学特征。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7342/11346643/228f70c17890/pone.0309391.g001.jpg

相似文献

Classification of SARS-CoV-2 sequences as recombinants via a pre-trained CNN and identification of a mathematical signature relative to recombinant feature at Spike, via interpretability.通过预先训练的 CNN 对 SARS-CoV-2 序列进行分类，确定 Spike 上与重组特征相关的数学特征的可解释性。

PLoS One. 2024 Aug 26;19(8):e0309391. doi: 10.1371/journal.pone.0309391. eCollection 2024.

Identification of Various Recombinants in a Patient Coinfected With the Different SARS-CoV-2 Variants.鉴定一位同时感染两种不同 SARS-CoV-2 变异株患者的多种重组体。

Influenza Other Respir Viruses. 2024 Jun;18(6):e13340. doi: 10.1111/irv.13340.

Pandemic-scale phylogenomics reveals the SARS-CoV-2 recombination landscape.大流行规模的系统发生基因组学揭示了 SARS-CoV-2 的重组景观。

Nature. 2022 Sep;609(7929):994-997. doi: 10.1038/s41586-022-05189-9. Epub 2022 Aug 11.

Evolutionary and Phylogenetic Dynamics of SARS-CoV-2 Variants: A Genetic Comparative Study of Taiyuan and Wuhan Cities of China.SARS-CoV-2 变异株的进化与系统发育动态：中国太原市与武汉市的遗传比较研究。

Viruses. 2024 Jun 3;16(6):907. doi: 10.3390/v16060907.

SARS-CoV-2 Variants and Their Relevant Mutational Profiles: Update Summer 2021.SARS-CoV-2 变体及其相关突变特征：2021 年夏季更新。

Microbiol Spectr. 2021 Dec 22;9(3):e0109621. doi: 10.1128/Spectrum.01096-21. Epub 2021 Nov 17.

Evolution of SARS-CoV-2 Envelope, Membrane, Nucleocapsid, and Spike Structural Proteins from the Beginning of the Pandemic to September 2020: A Global and Regional Approach by Epidemiological Week.SARS-CoV-2 包膜、膜、核衣壳和刺突结构蛋白在大流行开始到 2020 年 9 月期间的演变：按流行病学周进行的全球和区域方法。

Viruses. 2021 Feb 4;13(2):243. doi: 10.3390/v13020243.

On the origin and evolution of SARS-CoV-2.关于严重急性呼吸综合征冠状病毒2（SARS-CoV-2）的起源与进化

Exp Mol Med. 2021 Apr;53(4):537-547. doi: 10.1038/s12276-021-00604-z. Epub 2021 Apr 16.

HIV-1 and SARS-CoV-2: Patterns in the evolution of two pandemic pathogens.HIV-1与新型冠状病毒：两种大流行病原体的进化模式

Cell Host Microbe. 2021 Jul 14;29(7):1093-1110. doi: 10.1016/j.chom.2021.05.012. Epub 2021 Jun 3.

Evolutionary and Phenotypic Characterization of Two Spike Mutations in European Lineage 20E of SARS-CoV-2.两种 SARS-CoV-2 欧洲谱系 20E 刺突突变的进化和表型特征。

mBio. 2021 Dec 21;12(6):e0231521. doi: 10.1128/mBio.02315-21. Epub 2021 Nov 16.

Tracking SARS-CoV-2 Omicron diverse spike gene mutations identifies multiple inter-variant recombination events.追踪 SARS-CoV-2 奥密克戎多样化的刺突基因突变，鉴定出多个变异株间重组事件。

Signal Transduct Target Ther. 2022 Apr 26;7(1):138. doi: 10.1038/s41392-022-00992-2.

本文引用的文献

COVID-19.新型冠状病毒肺炎（Corona Virus Disease 2019，COVID-19）

Ann Intern Med. 2023 Oct;176(10):ITC145-ITC160. doi: 10.7326/AITC202310170. Epub 2023 Oct 10.

Canine respiratory coronavirus in Thailand undergoes mutation and evidences a potential putative parent for genetic recombination.泰国的犬呼吸道冠状病毒发生了突变，并有证据表明它可能是基因重组的潜在亲本。

Microbiol Spectr. 2023 Sep 14;11(5):e0226823. doi: 10.1128/spectrum.02268-23.

SARS-CoV-2 variants, its recombinants and epigenomic exploitation of host defenses.SARS-CoV-2 变体、其重组体和表观基因组对宿主防御的利用。

Biochim Biophys Acta Mol Basis Dis. 2023 Dec;1869(8):166836. doi: 10.1016/j.bbadis.2023.166836. Epub 2023 Aug 5.

Deep learning techniques for detection and prediction of pandemic diseases: a systematic literature review.用于大流行疾病检测与预测的深度学习技术：一项系统的文献综述

Multimed Tools Appl. 2023 May 29:1-35. doi: 10.1007/s11042-023-15805-z.

A Weakly Supervised Gradient Attribution Constraint for Interpretable Classification and Anomaly Detection.一种用于可解释分类和异常检测的弱监督梯度归因约束。

IEEE Trans Med Imaging. 2023 Nov;42(11):3336-3347. doi: 10.1109/TMI.2023.3282789. Epub 2023 Oct 27.

The evolution of SARS-CoV-2.严重急性呼吸综合征冠状病毒2的进化

Nat Rev Microbiol. 2023 Jun;21(6):361-379. doi: 10.1038/s41579-023-00878-2. Epub 2023 Apr 5.

SARS-CoV-2 variant biology: immune escape, transmission and fitness.SARS-CoV-2 变体生物学：免疫逃逸、传播和适应性。

Nat Rev Microbiol. 2023 Mar;21(3):162-177. doi: 10.1038/s41579-022-00841-7. Epub 2023 Jan 18.

Characterization of SARS-CoV-2 recombinants and emerging Omicron sublineages.SARS-CoV-2 重组体和新兴奥密克戎亚谱系的特征。

Int J Med Sci. 2023 Jan 1;20(1):151-162. doi: 10.7150/ijms.79116. eCollection 2023.

Mortality among hospitalized COVID-19 patients during surges of SARS-CoV-2 alpha (B.1.1.7) and delta (B.1.617.2) variants.住院 COVID-19 患者在 SARS-CoV-2 阿尔法（B.1.1.7）和德尔塔（B.1.617.2）变异株流行期间的死亡率。

Sci Rep. 2022 Nov 7;12(1):18918. doi: 10.1038/s41598-022-23312-8.

Pandemic-scale phylogenomics reveals the SARS-CoV-2 recombination landscape.大流行规模的系统发生基因组学揭示了 SARS-CoV-2 的重组景观。

Nature. 2022 Sep;609(7929):994-997. doi: 10.1038/s41586-022-05189-9. Epub 2022 Aug 11.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

通过预先训练的 CNN 对 SARS-CoV-2 序列进行分类，确定 Spike 上与重组特征相关的数学特征的可解释性。

Classification of SARS-CoV-2 sequences as recombinants via a pre-trained CNN and identification of a mathematical signature relative to recombinant feature at Spike, via interpretability.

机构信息

出版信息

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献