• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用香农熵和 K 均值聚类来理解 SARS-CoV-2 刺突蛋白的突变热点。

Understanding mutation hotspots for the SARS-CoV-2 spike protein using Shannon Entropy and K-means clustering.

机构信息

Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, PA, 15213, USA.

Department of Mechanical Engineering, Carnegie Mellon University, Pittsburgh, PA, 15213, USA.

出版信息

Comput Biol Med. 2021 Nov;138:104915. doi: 10.1016/j.compbiomed.2021.104915. Epub 2021 Oct 5.

DOI:10.1016/j.compbiomed.2021.104915
PMID:34655896
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8492016/
Abstract

The SARS-CoV-2 virus like many other viruses has transformed in a continual manner to give rise to new variants by means of mutations commonly through substitutions and indels. These mutations in some cases can give the virus a survival advantage making the mutants dangerous. In general, laboratory investigation must be carried to determine whether the new variants have any characteristics that can make them more lethal and contagious. Therefore, complex and time-consuming analyses are required in order to delve deeper into the exact impact of a particular mutation. The time required for these analyses makes it difficult to understand the variants of concern and thereby limiting the preventive action that can be taken against them spreading rapidly. In this analysis, we have deployed a statistical technique Shannon Entropy, to identify positions in the spike protein of SARS Cov-2 viral sequence which are most susceptible to mutations. Subsequently, we also use machine learning based clustering techniques to cluster known dangerous mutations based on similarities in properties. This work utilizes embeddings generated using language modeling, the ProtBERT model, to identify mutations of a similar nature and to pick out regions of interest based on proneness to change. Our entropy-based analysis successfully predicted the fifteen hotspot regions, among which we were able to validate ten known variants of interest, in six hotspot regions. As the situation of SARS-COV-2 virus rapidly evolves we believe that the remaining nine mutational hotspots may contain variants that can emerge in the future. We believe that this may be promising in helping the research community to devise therapeutics based on probable new mutation zones in the viral sequence and resemblance in properties of various mutations.

摘要

SARS-CoV-2 病毒与许多其他病毒一样,通过常见的替换和插入/缺失突变,不断地发生变异,从而产生新的变体。在某些情况下,这些突变可以使病毒获得生存优势,使突变体变得危险。一般来说,必须进行实验室研究,以确定新变体是否具有任何使其更具致命性和传染性的特征。因此,需要进行复杂且耗时的分析,以便更深入地研究特定突变的确切影响。这些分析所需的时间使得难以理解关注的变体,从而限制了针对它们快速传播的预防措施。在这项分析中,我们部署了一种统计技术——香农熵,以确定 SARS-CoV-2 病毒序列刺突蛋白中最容易发生突变的位置。随后,我们还使用基于机器学习的聚类技术,根据性质的相似性对已知的危险突变进行聚类。这项工作利用语言建模生成的嵌入,即 ProtBERT 模型,来识别具有相似性质的突变,并根据易变性质选择感兴趣的区域。我们基于熵的分析成功预测了 15 个热点区域,其中我们能够在 6 个热点区域中验证 10 个已知的感兴趣变体。由于 SARS-CoV-2 病毒的情况迅速演变,我们认为其余 9 个突变热点可能包含未来可能出现的变体。我们相信,这有助于研究界根据病毒序列中可能出现的新突变区域和各种突变的性质相似性,设计基于治疗的方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2fcb/8492016/e0c291673c00/gr4_lrg.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2fcb/8492016/4c5730fae87f/gr1_lrg.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2fcb/8492016/930f9162593f/gr2_lrg.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2fcb/8492016/655fcd14524b/gr3_lrg.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2fcb/8492016/e0c291673c00/gr4_lrg.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2fcb/8492016/4c5730fae87f/gr1_lrg.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2fcb/8492016/930f9162593f/gr2_lrg.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2fcb/8492016/655fcd14524b/gr3_lrg.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2fcb/8492016/e0c291673c00/gr4_lrg.jpg

相似文献

1
Understanding mutation hotspots for the SARS-CoV-2 spike protein using Shannon Entropy and K-means clustering.利用香农熵和 K 均值聚类来理解 SARS-CoV-2 刺突蛋白的突变热点。
Comput Biol Med. 2021 Nov;138:104915. doi: 10.1016/j.compbiomed.2021.104915. Epub 2021 Oct 5.
2
An entropy-based study on the mutational landscape of SARS-CoV-2 in USA: Comparing different variants and revealing co-mutational behavior of proteins.基于熵的美国 SARS-CoV-2 突变景观研究:比较不同变体并揭示蛋白质的共突变行为。
Gene. 2024 Sep 5;922:148556. doi: 10.1016/j.gene.2024.148556. Epub 2024 May 14.
3
Escape from neutralizing antibodies by SARS-CoV-2 spike protein variants.通过 SARS-CoV-2 刺突蛋白变体逃避中和抗体。
Elife. 2020 Oct 28;9:e61312. doi: 10.7554/eLife.61312.
4
Transformations, Lineage Comparisons, and Analysis of Down-to-Up Protomer States of Variants of the SARS-CoV-2 Prefusion Spike Protein, Including the UK Variant B.1.1.7.SARS-CoV-2 前融合刺突蛋白变体的构象转变、谱系比较及从头至尾三聚体状态分析,包括英国 B.1.1.7 变体。
Microbiol Spectr. 2021 Sep 3;9(1):e0003021. doi: 10.1128/Spectrum.00030-21. Epub 2021 Aug 4.
5
Effects of common mutations in the SARS-CoV-2 Spike RBD and its ligand, the human ACE2 receptor on binding affinity and kinetics.常见突变对 SARS-CoV-2 刺突 RBD 及其配体人 ACE2 受体结合亲和力和动力学的影响。
Elife. 2021 Aug 26;10:e70658. doi: 10.7554/eLife.70658.
6
The British variant of the new coronavirus-19 (Sars-Cov-2) should not create a vaccine problem.新冠病毒-19(Sars-Cov-2)的英国变体不应造成疫苗问题。
J Biol Regul Homeost Agents. 2021 Jan-Feb;35(1):1-4. doi: 10.23812/21-3-E.
7
V367F Mutation in SARS-CoV-2 Spike RBD Emerging during the Early Transmission Phase Enhances Viral Infectivity through Increased Human ACE2 Receptor Binding Affinity.SARS-CoV-2 刺突 RBD 中的 V367F 突变增强了与人类 ACE2 受体的结合亲和力,从而提高了病毒的感染性。
J Virol. 2021 Jul 26;95(16):e0061721. doi: 10.1128/JVI.00617-21.
8
SARS-CoV-2 Variants, RBD Mutations, Binding Affinity, and Antibody Escape.SARS-CoV-2 变体、RBD 突变、结合亲和力和抗体逃逸。
Int J Mol Sci. 2021 Nov 9;22(22):12114. doi: 10.3390/ijms222212114.
9
Quantifying Mutational Response to Track the Evolution of SARS-CoV-2 Spike Variants: Introducing a Statistical-Mechanics-Guided Machine Learning Method.量化突变反应以追踪 SARS-CoV-2 刺突变异株的进化:引入一种基于统计力学指导的机器学习方法。
J Phys Chem B. 2022 Oct 13;126(40):7895-7905. doi: 10.1021/acs.jpcb.2c04574. Epub 2022 Sep 30.
10
Mutations Strengthened SARS-CoV-2 Infectivity.突变增强了 SARS-CoV-2 的感染性。
J Mol Biol. 2020 Sep 4;432(19):5212-5226. doi: 10.1016/j.jmb.2020.07.009. Epub 2020 Jul 23.

引用本文的文献

1
Identification of severity related mutation hotspots in SARS-CoV-2 using a density-based clustering approach.使用基于密度的聚类方法识别新冠病毒中与严重程度相关的突变热点。
BioData Min. 2025 Sep 1;18(1):61. doi: 10.1186/s13040-025-00476-3.
2
Reannotation of cancer mutations based on expressed RNA transcripts reveals functional non-coding mutations in melanoma.基于表达的RNA转录本对癌症突变进行重新注释揭示了黑色素瘤中的功能性非编码突变。
Am J Hum Genet. 2025 Jun 5;112(6):1447-1467. doi: 10.1016/j.ajhg.2025.04.005. Epub 2025 May 12.
3
Evolution of pollutant biodegradation.

本文引用的文献

1
EFFECT OF RBD MUTATIONS IN SPIKE GLYCOPROTEIN OF SARS-COV-2 ON NEUTRALIZING IGG AFFINITY.SARS-CoV-2 刺突糖蛋白 RBD 突变对中和 IgG 亲和力的影响。
Georgian Med News. 2023 Jul-Aug(340-341):37-46.
2
Evolutionary Tracking of SARS-CoV-2 Genetic Variants Highlights an Intricate Balance of Stabilizing and Destabilizing Mutations.SARS-CoV-2 基因变异的进化追踪突显了稳定和不稳定突变之间的复杂平衡。
mBio. 2021 Aug 31;12(4):e0118821. doi: 10.1128/mBio.01188-21. Epub 2021 Jul 20.
3
ProtTrans: Toward Understanding the Language of Life Through Self-Supervised Learning.
污染物生物降解的演变
Appl Microbiol Biotechnol. 2025 Feb 4;109(1):36. doi: 10.1007/s00253-025-13418-0.
4
Passive infusion of an S2-Stem broadly neutralizing antibody protects against SARS-CoV-2 infection and lower airway inflammation in rhesus macaques.被动输注一种S2茎区广泛中和抗体可保护恒河猴免受SARS-CoV-2感染并减轻下呼吸道炎症。
PLoS Pathog. 2025 Jan 23;21(1):e1012456. doi: 10.1371/journal.ppat.1012456. eCollection 2025 Jan.
5
A core network in the SARS-CoV-2 nucleocapsid NTD mediates structural integrity and selective RNA-binding.严重急性呼吸综合征冠状病毒2(SARS-CoV-2)核衣壳N端结构域中的一个核心网络介导结构完整性和选择性RNA结合。
Nat Commun. 2024 Dec 9;15(1):10656. doi: 10.1038/s41467-024-55024-0.
6
Using intrahost single nucleotide variant data to predict SARS-CoV-2 detection cycle threshold values.利用宿主内单核苷酸变异数据预测 SARS-CoV-2 的检测循环阈值。
PLoS One. 2024 Oct 30;19(10):e0312686. doi: 10.1371/journal.pone.0312686. eCollection 2024.
7
The evolutionary features and roles of single nucleotide variants and charged amino acid mutations in influenza outbreaks during NPI period.在非药物干预期间流感爆发中,单核苷酸变异和带电氨基酸突变的进化特征和作用。
Sci Rep. 2024 Sep 3;14(1):20418. doi: 10.1038/s41598-024-71349-8.
8
Passive infusion of an S2-Stem broadly neutralizing antibody protects against SARS-CoV-2 infection and lower airway inflammation in rhesus macaques.被动输注S2茎区广泛中和抗体可保护恒河猴免受SARS-CoV-2感染并减轻下呼吸道炎症。
bioRxiv. 2024 Jul 30:2024.07.30.605768. doi: 10.1101/2024.07.30.605768.
9
Improvements in viral gene annotation using large language models and soft alignments.利用大型语言模型和软对齐技术改进病毒基因注释。
BMC Bioinformatics. 2024 Apr 25;25(1):165. doi: 10.1186/s12859-024-05779-6.
10
Bioinformatic, Biochemical, and Immunological Mining of MHC Class I Restricted T Cell Epitopes for a Marburg Nucleoprotein Microparticle Vaccine.用于马尔堡核蛋白微粒疫苗的MHC I类限制性T细胞表位的生物信息学、生物化学和免疫学挖掘
Vaccines (Basel). 2024 Mar 18;12(3):322. doi: 10.3390/vaccines12030322.
ProtTrans:通过自监督学习理解生命语言。
IEEE Trans Pattern Anal Mach Intell. 2022 Oct;44(10):7112-7127. doi: 10.1109/TPAMI.2021.3095381. Epub 2022 Sep 14.
4
Recurrent emergence of SARS-CoV-2 spike deletion H69/V70 and its role in the Alpha variant B.1.1.7.SARS-CoV-2 刺突缺失 H69/V70 的反复出现及其在 Alpha 变异株 B.1.1.7 中的作用。
Cell Rep. 2021 Jun 29;35(13):109292. doi: 10.1016/j.celrep.2021.109292. Epub 2021 Jun 8.
5
Spread of a SARS-CoV-2 variant through Europe in the summer of 2020.2020 年夏 SARS-CoV-2 变异株在欧洲的传播。
Nature. 2021 Jul;595(7869):707-712. doi: 10.1038/s41586-021-03677-y. Epub 2021 Jun 7.
6
Forecasting of COVID-19 using deep layer Recurrent Neural Networks (RNNs) with Gated Recurrent Units (GRUs) and Long Short-Term Memory (LSTM) cells.使用带有门控循环单元(GRU)和长短期记忆(LSTM)细胞的深层循环神经网络(RNN)对2019冠状病毒病(COVID-19)进行预测。
Chaos Solitons Fractals. 2021 May;146:110861. doi: 10.1016/j.chaos.2021.110861. Epub 2021 Mar 14.
7
The SARS-CoV-2 Y453F mink variant displays a pronounced increase in ACE-2 affinity but does not challenge antibody neutralization.SARS-CoV-2 Y453F 变异 mink 变体显示出 ACE-2 亲和力的明显增加,但不会挑战抗体中和。
J Biol Chem. 2021 Jan-Jun;296:100536. doi: 10.1016/j.jbc.2021.100536. Epub 2021 Mar 11.
8
Sensitivity of SARS-CoV-2 B.1.1.7 to mRNA vaccine-elicited antibodies.SARS-CoV-2 B.1.1.7 对 mRNA 疫苗诱导抗体的敏感性。
Nature. 2021 May;593(7857):136-141. doi: 10.1038/s41586-021-03412-7. Epub 2021 Mar 11.
9
Potential neutralizing antibodies discovered for novel corona virus using machine learning.利用机器学习发现针对新型冠状病毒的潜在中和抗体。
Sci Rep. 2021 Mar 4;11(1):5261. doi: 10.1038/s41598-021-84637-4.
10
Circulating SARS-CoV-2 spike N439K variants maintain fitness while evading antibody-mediated immunity.循环 Sars-CoV-2 刺突 N439K 变体在保持适应性的同时逃避抗体介导的免疫。
Cell. 2021 Mar 4;184(5):1171-1187.e20. doi: 10.1016/j.cell.2021.01.037. Epub 2021 Jan 28.