Center of Excellence in IT, Institute of Management Sciences, Hayatabad, Peshawar, 25000, Khyber Pakhtunkhwa, Pakistan.
Department of Embedded Systems Engineering, Incheon National University, Incheon, Korea.
Interdiscip Sci. 2022 Jun;14(2):504-519. doi: 10.1007/s12539-021-00465-0. Epub 2021 Aug 6.
Recent pandemic of COVID-19 (Coronavirus) caused by severe acute respiratory syndrome Coronavirus 2 (SARS-CoV-2) has been growing lethally with unusual speed. It has infected millions of people and continues a mortifying influence on the global population's health and well-being. In this situation, genome sequence analysis and advanced artificial intelligence techniques may help researchers and medical experts to understand the genetic variants of COVID-19 or SARS-CoV-2. Genome sequence analysis of COVID-19 is crucial to understand the virus's origin, behavior, and structure, which might help produce/develop vaccines, antiviral drugs, and efficient preventive strategies. This paper introduces an artificial intelligence based system to perform genome sequence analysis of COVID-19 and alike viruses, e.g., SARS, middle east respiratory syndrome, and Ebola. The system helps to get important information from the genome sequences of different viruses. We perform comparative data analysis by extracting basic information of COVID-19 and other genome sequences, including information of nucleotides composition and their frequency, tri-nucleotide compositions, count of amino acids, alignment between genome sequences, and their DNA similarity information. We use different visualization methods to analyze these viruses' genome sequences and, finally, apply machine learning based classifier support vector machine to classify different genome sequences. The data set of different virus genome sequences are obtained from an online publicly accessible data center repository. The system achieves good classification results with an accuracy of 97% for COVID-19, 96%, SARS, and 95% for MERS and Ebola genome sequences, respectively.
近期由严重急性呼吸系统综合征冠状病毒 2(SARS-CoV-2)引起的 COVID-19(冠状病毒)大流行以异常的速度致命地蔓延。它感染了数百万人,并继续对全球人口的健康和福祉产生令人痛心的影响。在这种情况下,基因组序列分析和先进的人工智能技术可能有助于研究人员和医学专家了解 COVID-19 或 SARS-CoV-2 的遗传变异。分析 COVID-19 的基因组序列对于了解病毒的起源、行为和结构至关重要,这可能有助于生产/开发疫苗、抗病毒药物和有效的预防策略。本文介绍了一种基于人工智能的系统,用于分析 COVID-19 及类似病毒(如 SARS、中东呼吸综合征和埃博拉)的基因组序列。该系统有助于从不同病毒的基因组序列中获取重要信息。我们通过提取 COVID-19 和其他基因组序列的基本信息,包括核苷酸组成及其频率、三核苷酸组成、氨基酸计数、基因组序列比对及其 DNA 相似性信息,进行比较数据分析。我们使用不同的可视化方法来分析这些病毒的基因组序列,并最终应用基于机器学习的分类器支持向量机来对不同的基因组序列进行分类。不同病毒基因组序列的数据集来自在线公共可访问的数据中心存储库。该系统对 COVID-19 的分类结果达到了 97%的准确率,对 SARS 的准确率为 96%,对 MERS 和埃博拉的准确率分别为 95%。