Suppr超能文献

关注新冠病毒变体:一种预测新型蛋白质突变的深度神经网络方法。

Paying attention to the SARS-CoV-2 dialect : a deep neural network approach to predicting novel protein mutations.

作者信息

Elkin Magdalyn E, Zhu Xingquan

机构信息

Dept. Electrical Engineering and Computer Science, Florida Atlantic University, 777 Glades Road, Boca Raton, FL, 33431, USA.

出版信息

Commun Biol. 2025 Jan 21;8(1):98. doi: 10.1038/s42003-024-07262-7.

Abstract

Predicting novel mutations has long-lasting impacts on life science research. Traditionally, this problem is addressed through wet-lab experiments, which are often expensive and time consuming. The recent advancement in neural language models has provided stunning results in modeling and deciphering sequences. In this paper, we propose a Deep Novel Mutation Search (DNMS) method, using deep neural networks, to model protein sequence for mutation prediction. We use SARS-CoV-2 spike protein as the target and use a protein language model to predict novel mutations. Different from existing research which is often limited to mutating the reference sequence for prediction, we propose a parent-child mutation prediction paradigm where a parent sequence is modeled for mutation prediction. Because mutations introduce changing context to the underlying sequence, DNMS models three aspects of the protein sequences: semantic changes, grammatical changes, and attention changes, each modeling protein sequence aspects from shifting of semantics, grammar coherence, and amino-acid interactions in latent space. A ranking approach is proposed to combine all three aspects to capture mutations demonstrating evolving traits, in accordance with real-world SARS-CoV-2 spike protein sequence evolution. DNMS can be adopted for an early warning variant detection system, creating public health awareness of future SARS-CoV-2 mutations.

摘要

预测新出现的突变对生命科学研究有着持久的影响。传统上,这个问题是通过湿实验室实验来解决的,而这些实验往往既昂贵又耗时。神经语言模型的最新进展在序列建模和解码方面取得了惊人的成果。在本文中,我们提出了一种深度新突变搜索(DNMS)方法,利用深度神经网络对蛋白质序列进行建模以预测突变。我们以严重急性呼吸综合征冠状病毒2(SARS-CoV-2)刺突蛋白为目标,并使用蛋白质语言模型来预测新出现的突变。与现有研究通常局限于对参考序列进行突变以进行预测不同,我们提出了一种亲子突变预测范式,即对一个亲本序列进行建模以预测突变。由于突变会给基础序列引入不断变化的上下文,DNMS对蛋白质序列的三个方面进行建模:语义变化、语法变化和注意力变化,每个方面都从潜在空间中的语义转移、语法连贯性和氨基酸相互作用来对蛋白质序列方面进行建模。我们提出了一种排序方法,将这三个方面结合起来,以捕捉显示出进化特征的突变,这与现实世界中SARS-CoV-2刺突蛋白序列的进化情况一致。DNMS可用于早期预警变异检测系统,提高公众对未来SARS-CoV-2突变的健康意识。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a54d/11751191/07d1952230a5/42003_2024_7262_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验