Naghibzadeh Mahmoud, Savari Hossein, Savadi Abdorreza, Saadati Nayyereh, Mehrazin Elahe
Knowledge Engineering Research Group, Computer Engineering Dept., Ferdowsi University of Mashhad, Mashhad, Iran.
High Performance Computing Lab., Computer Engineering Dept., Ferdowsi University of Mashhad, Mashhad, Iran.
Inform Med Unlocked. 2020;19:100356. doi: 10.1016/j.imu.2020.100356. Epub 2020 May 21.
Recently, the outbreak of Coronavirus-Covid-19 has forced the World Health Organization to declare a pandemic status. A genome sequence is the core of this virus which interferes with the normal activities of its counterparts within humans. Analysis of its genome may provide clues toward the proper treatment of patients and the design of new drugs and vaccines. Microsatellites are composed of short genome subsequences which are successively repeated many times in the same direction. They are highly variable in terms of their building blocks, number of repeats, and their locations in the genome sequences. This mutability property has been the source of many diseases. Usually the host genome is analyzed to diagnose possible diseases in the victim. In this research, the focus is concentrated on the attacker's genome for discovery of its malicious properties.
The focus of this research is the microsatellites of both SARS and Covid-19. An accurate and highly efficient computer method for identifying all microsatellites in the genome sequences is discovered and implemented, and it is used to find all microsatellites in the Coronavirus-Covid-19 and SARS2003. The Microsatellite discovery is based on an efficient indexing technique called K-Mer Hash Indexing. The method is called Fast Microsatellite Discovery (FMSD) and it is used for both SARS and Covid-19. A table composed of all microsatellites is reported. There are many differences between SARS and Covid-19, but there is an outstanding difference which requires further investigation.
FMSD is freely available at https://gitlab.com/FUM_HPCLab/fmsd_project, implemented in C on Linux-Ubuntu system. Software related contact: hossein_savari@mail.um.ac.ir.
最近,新型冠状病毒-新冠疫情的爆发迫使世界卫生组织宣布其为大流行状态。基因组序列是这种病毒的核心,它干扰人体细胞内正常活动。对其基因组进行分析可能为患者的合理治疗以及新药和疫苗的设计提供线索。微卫星由短基因组子序列组成,这些子序列在同一方向上连续重复多次。它们在组成单元、重复次数及其在基因组序列中的位置方面具有高度变异性。这种可变性是许多疾病的根源。通常会分析宿主基因组以诊断受害者可能患有的疾病。在本研究中,重点集中在攻击者的基因组上,以发现其恶意特性。
本研究的重点是严重急性呼吸综合征(SARS)和新冠病毒的微卫星。发现并实现了一种准确且高效的计算机方法来识别基因组序列中的所有微卫星,并将其用于查找新冠病毒-新冠疫情和SARS 2003中的所有微卫星。微卫星发现基于一种称为K-Mer哈希索引的高效索引技术。该方法称为快速微卫星发现(FMSD),用于SARS和新冠病毒。报告了一个由所有微卫星组成的表格。SARS和新冠病毒之间存在许多差异,但有一个显著差异需要进一步研究。