Department of Computer Science, University of Uyo, P.M.B. 1017, Uyo, 520003, Nigeria.
Centre for Research and Development, University of Uyo, P.M.B. 1017, Uyo, 520003, Nigeria.
Sci Rep. 2021 Jul 15;11(1):14558. doi: 10.1038/s41598-021-93757-w.
Whereas accelerated attention beclouded early stages of the coronavirus spread, knowledge of actual pathogenicity and origin of possible sub-strains remained unclear. By harvesting the Global initiative on Sharing All Influenza Data (GISAID) database ( https://www.gisaid.org/ ), between December 2019 and January 15, 2021, a total of 8864 human SARS-CoV-2 complete genome sequences processed by gender, across 6 continents (88 countries) of the world, Antarctica exempt, were analyzed. We hypothesized that data speak for itself and can discern true and explainable patterns of the disease. Identical genome diversity and pattern correlates analysis performed using a hybrid of biotechnology and machine learning methods corroborate the emergence of inter- and intra- SARS-CoV-2 sub-strains transmission and sustain an increase in sub-strains within the various continents, with nucleotide mutations dynamically varying between individuals in close association with the virus as it adapts to its host/environment. Interestingly, some viral sub-strain patterns progressively transformed into new sub-strain clusters indicating varying amino acid, and strong nucleotide association derived from same lineage. A novel cognitive approach to knowledge mining helped the discovery of transmission routes and seamless contact tracing protocol. Our classification results were better than state-of-the-art methods, indicating a more robust system for predicting emerging or new viral sub-strain(s). The results therefore offer explanations for the growing concerns about the virus and its next wave(s). A future direction of this work is a defuzzification of confusable pattern clusters for precise intra-country SARS-CoV-2 sub-strains analytics.
尽管加速的注意力使冠状病毒传播的早期阶段变得模糊,但对实际致病性和可能的亚系起源的了解仍不清楚。通过利用全球流感数据共享倡议(GISAID)数据库(https://www.gisaid.org/),在 2019 年 12 月至 2021 年 1 月 15 日期间,分析了来自世界六大洲(南极洲除外的 88 个国家)的 8864 个人类 SARS-CoV-2 完整基因组序列,这些序列是按性别处理的。我们假设数据可以说明问题,并能够辨别疾病的真实和可解释模式。使用生物技术和机器学习方法的混合方法进行相同的基因组多样性和模式相关性分析,证实了 SARS-CoV-2 亚系的传播和在各大洲内的亚系数量增加,核苷酸突变在个体之间动态变化,与病毒适应宿主/环境的过程密切相关。有趣的是,一些病毒亚系模式逐渐演变成新的亚系簇,表明来自同一谱系的氨基酸和强烈的核苷酸存在差异。一种新的认知方法有助于发现传播途径和无缝的接触追踪协议。我们的分类结果优于最先进的方法,表明对于预测新出现或新的病毒亚系,我们的系统更加稳健。因此,结果为人们对病毒及其下一波疫情的日益关注提供了一些解释。这项工作的未来方向是对易混淆的模式簇进行去模糊化,以便对各国国内的 SARS-CoV-2 亚系进行更精确的分析。