Suppr超能文献

通过预先训练的 CNN 对 SARS-CoV-2 序列进行分类,确定 Spike 上与重组特征相关的数学特征的可解释性。

Classification of SARS-CoV-2 sequences as recombinants via a pre-trained CNN and identification of a mathematical signature relative to recombinant feature at Spike, via interpretability.

机构信息

Faculty of Engineering, University of Deusto, Bilbao, Biscay, Spain.

National Microbiology Center (NMC), Instituto de Salud Carlos III (ISCIII), Majadahonda, Madrid, Spain.

出版信息

PLoS One. 2024 Aug 26;19(8):e0309391. doi: 10.1371/journal.pone.0309391. eCollection 2024.

Abstract

The global impact of the SARS-CoV-2 pandemic has underscored the need for a deeper understanding of viral evolution to anticipate new viruses or variants. Genetic recombination is a fundamental mechanism in viral evolution, yet it remains poorly understood. In this study, we conducted a comprehensive research on the genetic regions associated with genetic recombination features in SARS-CoV-2. With this aim, we implemented a two-phase transfer learning approach using genomic spectrograms of complete SARS-CoV-2 sequences. In the first phase, we utilized a pre-trained VGG-16 model with genomic spectrograms of HIV-1, and in the second phase, we applied HIV-1 VGG-16 model to SARS-CoV-2 spectrograms. The identification of key recombination hot zones was achieved using the Grad-CAM interpretability tool, and the results were analyzed by mathematical and image processing techniques. Our findings unequivocally identify the SARS-CoV-2 Spike protein (S protein) as the pivotal region in the genetic recombination feature. For non-recombinant sequences, the relevant frequencies clustered around 1/6 and 1/12. In recombinant sequences, the sharp prominence of the main hot zone in the Spike protein prominently indicated a frequency of 1/6. These findings suggest that in the arithmetic series, every 6 nucleotides (two triplets) in S may encode crucial information, potentially concealing essential details about viral characteristics, in this case, recombinant feature of a SARS-CoV-2 genetic sequence. This insight further underscores the potential presence of multifaceted information within the genome, including mathematical signatures that define an organism's unique attributes.

摘要

SARS-CoV-2 大流行的全球影响突显了加深对病毒进化的理解以预测新病毒或变体的必要性。基因重组是病毒进化的基本机制,但人们对此仍知之甚少。在这项研究中,我们对与 SARS-CoV-2 遗传重组特征相关的遗传区域进行了全面研究。为此,我们采用了一种两阶段迁移学习方法,使用了完整的 SARS-CoV-2 序列的基因组光谱图。在第一阶段,我们使用带有 HIV-1 基因组光谱图的预训练 VGG-16 模型,在第二阶段,我们将 HIV-1 VGG-16 模型应用于 SARS-CoV-2 光谱图。使用 Grad-CAM 可解释性工具识别了关键重组热点区,然后通过数学和图像处理技术对结果进行了分析。我们的发现明确地将 SARS-CoV-2 刺突蛋白(S 蛋白)鉴定为遗传重组特征的关键区域。对于非重组序列,相关频率聚集在 1/6 和 1/12 附近。在重组序列中,S 蛋白中主要热点区的明显突出表明频率为 1/6。这些发现表明,在算术级数中,S 中的每 6 个核苷酸(三个三联体)可能编码重要信息,可能隐藏了有关病毒特征的重要细节,在这种情况下,SARS-CoV-2 遗传序列的重组特征。这一发现进一步强调了基因组中可能存在多方面的信息,包括定义生物体独特属性的数学特征。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7342/11346643/228f70c17890/pone.0309391.g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验