Laboratory of Genetics, Department of Biotechnology, School of Applied Biology and Biotechnology, Agricultural University of Athens, Athens, Greece.
University Research Institute of Maternal and Child Health & Precision Medicine, National and Kapodistrian University of Athens, "Aghia Sophia" Children's Hospital, Athens, Greece.
Adv Exp Med Biol. 2023;1423:59-78. doi: 10.1007/978-3-031-31978-5_6.
SARS-CoV-2 is a coronavirus responsible for one of the most serious, modern worldwide pandemics, with lasting and multifaceted effects. By late 2021, SARS-CoV-2 has infected more than 180 million people and has killed more than 3 million. The virus gains entrance to human cells through binding to ACE2 via its surface spike protein and causes a complex disease of the respiratory system, termed COVID-19. Vaccination efforts are being made to hinder the viral spread, and therapeutics are currently under development. Toward this goal, scientific attention is shifting toward variants and SNPs that affect factors of the disease such as susceptibility and severity. This genomic grammar, tightly related to the dark part of our genome, can be explored through the use of modern methods such as natural language processing. We present a semantic analysis of SARS-CoV-2-related publications, which yielded a repertoire of SNPs, genes, and disease ontologies. Population data from the 1000 Genomes Project were subsequently integrated into the pipeline. Data mining approaches of this scale have the potential to elucidate the complex interaction between COVID-19 pathogenesis and host genetic variation; the resulting knowledge can facilitate the management of high-risk groups and aid the efforts toward precision medicine.
SARS-CoV-2 是一种冠状病毒,引发了现代最严重的全球性大流行之一,造成了持久而多方面的影响。截至 2021 年末,SARS-CoV-2 已感染超过 1.8 亿人,导致超过 300 万人死亡。该病毒通过其表面刺突蛋白与 ACE2 结合进入人体细胞,并导致称为 COVID-19 的呼吸系统复杂疾病。目前正在开展疫苗接种工作以阻止病毒传播,同时正在开发治疗方法。为此,科学研究的重点转向影响疾病易感性和严重程度等因素的变异和单核苷酸多态性。这种与我们基因组的“暗区”密切相关的基因组语法,可以通过使用自然语言处理等现代方法来探索。我们对与 SARS-CoV-2 相关的出版物进行了语义分析,生成了一组 SNPs、基因和疾病本体。随后,将来自 1000 基因组计划的人群数据集成到该分析管道中。这种规模的数据挖掘方法有可能阐明 COVID-19 发病机制和宿主遗传变异之间的复杂相互作用;所获得的知识可以帮助管理高危人群,并为精准医学的努力提供支持。