Pathan Refat Khan, Biswas Munmun, Khandaker Mayeen Uddin
Department of Computer Science and Engineering, BGC Trust University Bangladesh, Chittagong-4381, Bangladesh.
Centre for Biomedical Physics, School of Healthcare and Medical Sciences, Sunway University, 47500 Bandar Sunway, Selangor, Malaysia.
Chaos Solitons Fractals. 2020 Sep;138:110018. doi: 10.1016/j.chaos.2020.110018. Epub 2020 Jun 13.
SARS-CoV-2, a novel coronavirus mostly known as COVID-19 has created a global pandemic. The world is now immobilized by this infectious RNA virus. As of June 15, already more than 7.9 million people have been infected and 432k people died. This RNA virus has the ability to do the mutation in the human body. Accurate determination of mutation rates is essential to comprehend the evolution of this virus and to determine the risk of emergent infectious disease. This study explores the mutation rate of the whole genomic sequence gathered from the patient's dataset of different countries. The collected dataset is processed to determine the nucleotide mutation and codon mutation separately. Furthermore, based on the size of the dataset, the determined mutation rate is categorized for four different regions: China, Australia, the United States, and the rest of the World. It has been found that a huge amount of Thymine (T) and Adenine (A) are mutated to other nucleotides for all regions, but codons are not frequently mutating like nucleotides. A recurrent neural network-based Long Short Term Memory (LSTM) model has been applied to predict the future mutation rate of this virus. The LSTM model gives Root Mean Square Error (RMSE) of 0.06 in testing and 0.04 in training, which is an optimized value. Using this train and testing process, the nucleotide mutation rate of 400 patient in future time has been predicted. About 0.1% increment in mutation rate is found for mutating of nucleotides from T to C and G, C to G and G to T. While a decrement of 0.1% is seen for mutating of T to A, and A to C. It is found that this model can be used to predict day basis mutation rates if more patient data is available in updated time.
严重急性呼吸综合征冠状病毒2(SARS-CoV-2),一种主要被称为新冠病毒病(COVID-19)的新型冠状病毒,引发了一场全球大流行。如今,这个世界因这种传染性RNA病毒而陷入停滞。截至6月15日,已有超过790万人感染,43.2万人死亡。这种RNA病毒有能力在人体中发生突变。准确确定突变率对于理解这种病毒的进化以及确定新发传染病的风险至关重要。本研究探讨了从不同国家患者数据集中收集的全基因组序列的突变率。对收集到的数据集进行处理,以分别确定核苷酸突变和密码子突变。此外,根据数据集的规模,将确定的突变率分为四个不同地区:中国、澳大利亚、美国和世界其他地区。研究发现,所有地区都有大量的胸腺嘧啶(T)和腺嘌呤(A)突变为其他核苷酸,但密码子不像核苷酸那样频繁突变。已应用基于循环神经网络的长短期记忆(LSTM)模型来预测这种病毒未来的突变率。LSTM模型在测试中的均方根误差(RMSE)为0.06,在训练中的均方根误差为0.04,这是一个优化值。通过这个训练和测试过程,预测了未来400名患者的核苷酸突变率。发现从T突变为C和G、C突变为G以及G突变为T时,突变率增加约0.1%。而从T突变为A以及A突变为C时,突变率下降0.1%。研究发现,如果在更新时间有更多患者数据,该模型可用于预测每日的突变率。