Univ. Lille, CNRS, Centrale Lille, UMR 9189 CRIStAL, F-59000 Lille, France.
Department of Mathematics, Science for Life Laboratory, Stockholm University, 106 91 Stockholm, Sweden.
Nucleic Acids Res. 2024 Sep 23;52(17):e82. doi: 10.1093/nar/gkae687.
Viral subgenomic RNA (sgRNA) plays a major role in SARS-COV2's replication, pathogenicity, and evolution. Recent sequencing protocols, such as the ARTIC protocol, have been established. However, due to the viral-specific biological processes, analyzing sgRNA through viral-specific read sequencing data is a computational challenge. Current methods rely on computational tools designed for eukaryote genomes, resulting in a gap in the tools designed specifically for sgRNA detection. To address this, we make two contributions. Firstly, we present sgENERATE, an evaluation pipeline to study the accuracy and efficacy of sgRNA detection tools using the popular ARTIC sequencing protocol. Using sgENERATE, we evaluate periscope, a recently introduced tool that detects sgRNA from ARTIC sequencing data. We find that periscope has biased predictions and high computational costs. Secondly, using the information produced from sgENERATE, we redesign the algorithm in periscope to use multiple references from canonical sgRNAs to mitigate alignment issues and improve sgRNA and non-canonical sgRNA detection. We evaluate periscope and our algorithm, periscope_multi, on simulated and biological sequencing datasets and demonstrate periscope_multi's enhanced sgRNA detection accuracy. Our contribution advances tools for studying viral sgRNA, paving the way for more accurate and efficient analyses in the context of viral RNA discovery.
病毒亚基因组 RNA(sgRNA)在 SARS-COV2 的复制、致病性和进化中起着重要作用。最近已经建立了一些测序方案,如 ARTIC 协议。然而,由于病毒的特殊生物学过程,通过病毒特异性的读测序数据来分析 sgRNA 是一个计算上的挑战。目前的方法依赖于为真核生物基因组设计的计算工具,这导致了针对 sgRNA 检测的工具存在差距。为了解决这个问题,我们做出了两个贡献。首先,我们提出了 sgENERATE,这是一个评估管道,用于使用流行的 ARTIC 测序协议研究 sgRNA 检测工具的准确性和效果。使用 sgENERATE,我们评估了最近引入的从 ARTIC 测序数据中检测 sgRNA 的工具 periscope。我们发现 periscope 有偏向性的预测和高计算成本。其次,我们利用 sgENERATE 生成的信息,重新设计了 periscope 中的算法,使用来自规范 sgRNA 的多个参考来减轻对齐问题,并提高 sgRNA 和非规范 sgRNA 的检测能力。我们在模拟和生物测序数据集上评估了 periscope 和我们的算法 periscope_multi,并展示了 periscope_multi 增强的 sgRNA 检测准确性。我们的贡献推进了用于研究病毒 sgRNA 的工具,为病毒 RNA 发现的更准确和高效分析铺平了道路。