School of Electrical Engineering & Computer Science, Oregon State University, Corvallis, OR 97331.
Baidu Research, Sunnyvale, CA 94089.
Proc Natl Acad Sci U S A. 2021 Dec 28;118(52). doi: 10.1073/pnas.2116269118.
The constant emergence of COVID-19 variants reduces the effectiveness of existing vaccines and test kits. Therefore, it is critical to identify conserved structures in severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) genomes as potential targets for variant-proof diagnostics and therapeutics. However, the algorithms to predict these conserved structures, which simultaneously fold and align multiple RNA homologs, scale at best cubically with sequence length and are thus infeasible for coronaviruses, which possess the longest genomes (∼30,000 nt) among RNA viruses. As a result, existing efforts on modeling SARS-CoV-2 structures resort to single-sequence folding as well as local folding methods with short window sizes, which inevitably neglect long-range interactions that are crucial in RNA functions. Here we present LinearTurboFold, an efficient algorithm for folding RNA homologs that scales linearly with sequence length, enabling unprecedented global structural analysis on SARS-CoV-2. Surprisingly, on a group of SARS-CoV-2 and SARS-related genomes, LinearTurboFold's purely in silico prediction not only is close to experimentally guided models for local structures, but also goes far beyond them by capturing the end-to-end pairs between 5' and 3' untranslated regions (UTRs) (∼29,800 nt apart) that match perfectly with a purely experimental work. Furthermore, LinearTurboFold identifies undiscovered conserved structures and conserved accessible regions as potential targets for designing efficient and mutation-insensitive small-molecule drugs, antisense oligonucleotides, small interfering RNAs (siRNAs), CRISPR-Cas13 guide RNAs, and RT-PCR primers. LinearTurboFold is a general technique that can also be applied to other RNA viruses and full-length genome studies and will be a useful tool in fighting the current and future pandemics.
不断出现的 COVID-19 变体降低了现有疫苗和检测试剂盒的有效性。因此,确定严重急性呼吸综合征冠状病毒 2 (SARS-CoV-2) 基因组中保守结构作为变体-proof 诊断和治疗的潜在靶标至关重要。然而,预测这些保守结构的算法同时折叠和对齐多个 RNA 同源物,其规模最好是与序列长度的立方成正比,因此对于 RNA 病毒中最长基因组(30000nt)的冠状病毒来说是不可行的。因此,目前对 SARS-CoV-2 结构建模的努力都依赖于单序列折叠以及具有短窗口大小的局部折叠方法,这不可避免地忽略了在 RNA 功能中至关重要的长程相互作用。在这里,我们提出了 LinearTurboFold,这是一种用于折叠 RNA 同源物的高效算法,它与序列长度呈线性比例缩放,使对 SARS-CoV-2 的前所未有的全局结构分析成为可能。令人惊讶的是,在一组 SARS-CoV-2 和 SARS 相关的基因组上,LinearTurboFold 的纯计算预测不仅与局部结构的实验指导模型接近,而且通过捕获 5' 和 3' 非翻译区 (UTR) 之间的端到端对(相距29800nt)远远超过了它们,这些端到端对与纯实验工作完全匹配。此外,LinearTurboFold 确定了未发现的保守结构和保守可及区域作为设计高效且突变不敏感的小分子药物、反义寡核苷酸、小干扰 RNA (siRNA)、CRISPR-Cas13 向导 RNA 和 RT-PCR 引物的潜在靶标。LinearTurboFold 是一种通用技术,也可以应用于其他 RNA 病毒和全长基因组研究,将成为应对当前和未来大流行的有用工具。