Suppr超能文献

新冠病毒突变的全基因组鉴定与预测显示存在大量变异体:生物信息学与深度神经网络的综合研究

Genome-wide identification and prediction of SARS-CoV-2 mutations show an abundance of variants: Integrated study of bioinformatics and deep neural learning.

作者信息

Hossain Md Shahadat, Pathan A Q M Sala Uddin, Islam Md Nur, Tonmoy Mahafujul Islam Quadery, Rakib Mahmudul Islam, Munim Md Adnan, Saha Otun, Fariha Atqiya, Reza Hasan Al, Roy Maitreyee, Bahadur Newaz Mohammed, Rahaman Md Mizanur

机构信息

Department of Biotechnology & Genetic Engineering, Noakhali Science and Technology University, Noakhali, Bangladesh.

Department of Computer Science and Telecommunication Engineering, Noakhali Science and Technology University, Noakhali, Bangladesh.

出版信息

Inform Med Unlocked. 2021;27:100798. doi: 10.1016/j.imu.2021.100798. Epub 2021 Nov 18.

Abstract

Genomic data analysis is a fundamental system for monitoring pathogen evolution and the outbreak of infectious diseases. Based on bioinformatics and deep learning, this study was designed to identify the genomic variability of SARS-CoV-2 worldwide and predict the impending mutation rate. Analysis of 259044 SARS-CoV-2 isolates identified 3334545 mutations with an average of 14.01 mutations per isolate. Globally, single nucleotide polymorphism (SNP) is the most prevalent mutational event. The prevalence of C > T (52.67%) was noticed as a major alteration across the world followed by the G > T (14.59%) and A > G (11.13%). Strains from India showed the highest number of mutations (48) followed by Scotland, USA, Netherlands, Norway, and France having up to 36 mutations. D416G, F106F, P314L, UTR:C241T, L93L, A222V, A199A, V30L, and A220V mutations were found as the most frequent mutations. D1118H, S194L, R262H, M809L, P314L, A8D, S220G, A890D, G1433C, T1456I, R233C, F263S, L111K, A54T, A74V, L183A, A316T, V212F, L46C, V48G, Q57H, W131R, G172V, Q185H, and Y206S missense mutations were found to largely decrease the structural stability of the corresponding proteins. Conversely, D3L, L5F, and S97I were found to largely increase the structural stability of the corresponding proteins. Multi-nucleotide mutations GGG > AAC, CC > TT, TG > CA, and AT > TA have come up in our analysis which are in the top 20 mutational cohort. Future mutation rate analysis predicts a 17%, 7%, and 3% increment of C > T, A > G, and A > T, respectively in the future. Conversely, 7%, 7%, and 6% decrement is estimated for T > C, G > A, and G > T mutations, respectively. T > G\A, C > G\A, and A > T\C are not anticipated in the future. Since SARS-CoV-2 is mutating continuously, our findings will facilitate the tracking of mutations and help to map the progression of the COVID-19 intensity worldwide.

摘要

基因组数据分析是监测病原体进化和传染病爆发的基础系统。基于生物信息学和深度学习,本研究旨在识别全球范围内严重急性呼吸综合征冠状病毒2(SARS-CoV-2)的基因组变异性,并预测即将发生的突变率。对259044个SARS-CoV-2分离株的分析确定了3334545个突变,每个分离株平均有14.01个突变。在全球范围内,单核苷酸多态性(SNP)是最普遍的突变事件。在全球范围内,C>T(52.67%)的发生率被视为主要改变,其次是G>T(14.59%)和A>G(11.13%)。来自印度的毒株显示出最多的突变数(48个),其次是苏格兰、美国、荷兰、挪威和法国,有多达36个突变。发现D416G、F106F、P314L、UTR:C241T、L93L、A222V、A199A、V30L和A220V突变是最常见的突变。发现D1118H、S194L、R262H、M809L、P314L、A8D、S220G、A890D、G1433C、T1456I、R233C、F263S、L111K、A54T、A74V、L183A、A316T、V212F、L46C、V48G、Q57H、W131R、G172V、Q185H和Y206S错义突变在很大程度上降低了相应蛋白质的结构稳定性。相反,发现D3L、L5F和S97I在很大程度上增加了相应蛋白质的结构稳定性。在我们的分析中出现了多核苷酸突变GGG>AAC、CC>TT、TG>CA和AT>TA,它们在突变队列中排名前20。未来突变率分析预测,未来C>T、A>G和A>T分别增加17%、7%和3%。相反,估计T>C/G>A和G>T突变分别减少7%、7%和6%。未来预计不会出现T>G/A、C>G/A和A>T/C。由于SARS-CoV-2在不断变异,我们的研究结果将有助于追踪突变情况,并有助于绘制全球范围内2019冠状病毒病(COVID-19)强度的变化趋势。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d552/8598266/1a596f5faf2b/gr1_lrg.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验