Chakraborty Chiranjib, Bhattacharya Manojit, Pal Soumen, Lee Sang-Soo
Department of Biotechnology, School of Life Science and Biotechnology, Adamas University, Kolkata, West Bengal 700126, India.
Department of Zoology, Fakir Mohan University, Vyasa Vihar, Balasore 756020, Odisha, India.
Int J Biol Macromol. 2025 Jan;287:138547. doi: 10.1016/j.ijbiomac.2024.138547. Epub 2024 Dec 8.
The research aims to identify and characterize the antibody escape mutations of NTD and RBD regions of SARS-CoV-2 using prompt engineering-enabled combined LLMs (large language models) and instigative bioinformatics techniques. We used two LLMs (ChatGPT and Mistral 7B) and one MLLM (Gemini model) to retrieve the significant NTD and RBD mutations. The retrieved significant mutations were characterized through the in silico servers. The retrieved 15 NTD significant mutations (six deletions and nine-point mutations) and 17 RBD point mutations were noted. We further characterized them in terms of distribution, count, ΔΔG of mutation (ΔΔG mCSM, ΔΔG DUET, ΔΔGSDM) to understand the stabilized or destabilized mutation, interaction interface, distance to PPI interface, Δvibrational entropy energy (ΔΔSVib ENCoM), and change in the flexibility. Here, we analyzed every mutation's ΔΔG, interaction, and related parameters using the trimeric Spike protein complex. In NTD mutations, our five analyzed mutations show two destabilising (G142D, R190S) and three showing stabilising properties (D215G, A222V, and R246I). Some RBD mutations are noted as entirely destabilising (K417N, K417T, L452R, F490S). N440K, N460K, and Q493R show stabilising and neutral properties. Combined LLMs and instigative bioinformatics techniques were used to identify and characterize the antibody escape mutations. With our strategy, the LLM and MLLM can help to fight future pandemic viruses by quickly identifying mutations and their significance.
该研究旨在利用基于提示工程的联合大语言模型(LLMs)和激发性生物信息学技术,识别和表征严重急性呼吸综合征冠状病毒2(SARS-CoV-2)N端结构域(NTD)和受体结合结构域(RBD)的抗体逃逸突变。我们使用了两个大语言模型(ChatGPT和米斯特拉尔7B)和一个多模态大语言模型(Gemini模型)来检索重要的NTD和RBD突变。通过计算机服务器对检索到的重要突变进行了表征。记录了检索到的15个NTD重要突变(6个缺失和9个点突变)和17个RBD点突变。我们进一步从分布、数量、突变的ΔΔG(ΔΔG mCSM、ΔΔG DUET、ΔΔG SDM)、相互作用界面、到蛋白质-蛋白质相互作用(PPI)界面的距离、Δ振动熵能量(ΔΔSVib ENCoM)以及灵活性变化等方面对它们进行了表征。在此,我们使用三聚体刺突蛋白复合物分析了每个突变的ΔΔG、相互作用及相关参数。在NTD突变中,我们分析的5个突变中有2个显示出不稳定特性(G142D、R190S),3个显示出稳定特性(D215G、A222V和R246I)。一些RBD突变被认为完全不稳定(K417N、K417T、L452R、F490S)。N440K、N460K和Q493R显示出稳定和中性特性。联合大语言模型和激发性生物信息学技术被用于识别和表征抗体逃逸突变。通过我们的策略,大语言模型和多模态大语言模型可以通过快速识别突变及其重要性来帮助对抗未来的大流行病毒。