Suppr超能文献

一种新颖的统计方法可预测新冠病毒基因组片段的可突变性。

A novel statistical method predicts mutability of the genomic segments of the SARS-CoV-2 virus.

作者信息

Darooneh Amir Hossein, Przedborski Michelle, Kohandel Mohammad

机构信息

Department of Applied Mathematics, University of Waterloo, Waterloo, ON, Canada.

出版信息

QRB Discov. 2021 Dec 13;3:e1. doi: 10.1017/qrd.2021.13. eCollection 2022.

Abstract

The SARS-CoV-2 virus has made the largest pandemic of the 21st century, with hundreds of millions of cases and tens of millions of fatalities. Scientists all around the world are racing to develop vaccines and new pharmaceuticals to overcome the pandemic and offer effective treatments for COVID-19 disease. Consequently, there is an essential need to better understand how the pathogenesis of SARS-CoV-2 is affected by viral mutations and to determine the conserved segments in the viral genome that can serve as stable targets for novel therapeutics. Here, we introduce a text-mining method to estimate the mutability of genomic segments directly from a reference (ancestral) whole genome sequence. The method relies on calculating the importance of genomic segments based on their spatial distribution and frequency over the whole genome. To validate our approach, we perform a large-scale analysis of the viral mutations in nearly 80,000 publicly available SARS-CoV-2 predecessor whole genome sequences and show that these results are highly correlated with the segments predicted by the statistical method used for keyword detection. Importantly, these correlations are found to hold at the codon and gene levels, as well as for gene coding regions. Using the text-mining method, we further identify codon sequences that are potential candidates for siRNA-based antiviral drugs. Significantly, one of the candidates identified in this work corresponds to the first seven codons of an epitope of the spike glycoprotein, which is the only SARS-CoV-2 immunogenic peptide without a match to a human protein.

摘要

严重急性呼吸综合征冠状病毒2(SARS-CoV-2)引发了21世纪规模最大的大流行,造成数亿人感染,数千万人死亡。世界各地的科学家都在竞相研发疫苗和新型药物,以战胜这场大流行并为新冠肺炎提供有效治疗。因此,迫切需要更好地了解SARS-CoV-2的发病机制如何受到病毒突变的影响,并确定病毒基因组中可作为新型治疗药物稳定靶点的保守片段。在此,我们介绍一种文本挖掘方法,可直接从参考(祖先)全基因组序列估计基因组片段的可变性。该方法基于计算基因组片段在整个基因组中的空间分布和频率来确定其重要性。为验证我们的方法,我们对近80000条公开可用的SARS-CoV-2前身全基因组序列中的病毒突变进行了大规模分析,结果表明这些结果与用于关键词检测的统计方法预测的片段高度相关。重要的是,这些相关性在密码子和基因水平以及基因编码区域均成立。利用文本挖掘方法,我们进一步确定了基于小干扰RNA(siRNA)的抗病毒药物的潜在候选密码子序列。值得注意的是,这项工作中确定的候选序列之一对应于刺突糖蛋白一个表位的前七个密码子,该表位是SARS-CoV-2唯一与人蛋白无匹配的免疫原性肽。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/06b1/10392689/c540e4c0f730/S2633289221000132_fig1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验