School of Information Technology, Deakin University, Victoria, Australia.
School of Engineering, Deakin University, Victoria, Australia.
Sci Rep. 2021 Feb 10;11(1):3487. doi: 10.1038/s41598-021-83105-3.
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is a highly pathogenic virus that has caused the global COVID-19 pandemic. Tracing the evolution and transmission of the virus is crucial to respond to and control the pandemic through appropriate intervention strategies. This paper reports and analyses genomic mutations in the coding regions of SARS-CoV-2 and their probable protein secondary structure and solvent accessibility changes, which are predicted using deep learning models. Prediction results suggest that mutation D614G in the virus spike protein, which has attracted much attention from researchers, is unlikely to make changes in protein secondary structure and relative solvent accessibility. Based on 6324 viral genome sequences, we create a spreadsheet dataset of point mutations that can facilitate the investigation of SARS-CoV-2 in many perspectives, especially in tracing the evolution and worldwide spread of the virus. Our analysis results also show that coding genes E, M, ORF6, ORF7a, ORF7b and ORF10 are most stable, potentially suitable to be targeted for vaccine and drug development.
严重急性呼吸综合征冠状病毒 2(SARS-CoV-2)是一种高致病性病毒,导致了全球 COVID-19 大流行。追踪病毒的进化和传播对于通过适当的干预策略应对和控制大流行至关重要。本文报告和分析了 SARS-CoV-2 编码区的基因组突变及其可能的蛋白质二级结构和溶剂可及性变化,这些变化是使用深度学习模型预测的。预测结果表明,病毒刺突蛋白中备受研究人员关注的突变 D614G 不太可能改变蛋白质二级结构和相对溶剂可及性。基于 6324 个病毒基因组序列,我们创建了一个点突变电子表格数据集,可促进从多个角度研究 SARS-CoV-2,特别是在追踪病毒的进化和全球传播方面。我们的分析结果还表明,编码基因 E、M、ORF6、ORF7a、ORF7b 和 ORF10 最稳定,可能适合作为疫苗和药物开发的靶点。