Division of Cell Biology, Department of Biomedical and Clinical Sciences, Linkoping University, Linkoping SE-58185, Sweden.
Bioinformatics. 2023 Apr 3;39(4). doi: 10.1093/bioinformatics/btad144.
Feature-based counting is commonly used in RNA-sequencing (RNA-seq) analyses. Here, sequences must align to target features (like genes or non-coding RNAs) and related sequences with different compositions are counted into the same feature. Consequently, sequence integrity is lost, making results less traceable against raw data.Small RNA (sRNA) often maps to multiple features and shows an incredible diversity in form and function. Therefore, applying feature-based strategies may increase the risk of misinterpretation. We present a strategy for sRNA-seq analysis that preserves the integrity of the raw sequence making the data lineage fully traceable. We have consolidated this strategy into Seqpac: An R package that makes a complete sRNA analysis available on multiple platforms. Using published biological data, we show that Seqpac reveals hidden bias and adds new insights to studies that were previously analyzed using feature-based counting.We have identified limitations in the concurrent analysis of RNA-seq data. We call it the traceability dilemma in alignment-based sequencing strategies. By building a flexible framework that preserves the integrity of the read sequence throughout the analysis, we demonstrate better interpretability in sRNA-seq experiments, which are particularly vulnerable to this problem. Applying similar strategies to other transcriptomic workflows may aid in resolving the replication crisis experienced by many fields that depend on transcriptome analyses.
Seqpac is available on Bioconductor (https://bioconductor.org/packages/seqpac) and GitHub (https://github.com/danis102/seqpac).
基于特征的计数在 RNA 测序 (RNA-seq) 分析中被广泛应用。在这里,序列必须与目标特征(如基因或非编码 RNA)对齐,并且具有不同组成的相关序列被计入同一个特征中。因此,序列完整性丢失,使得结果与原始数据的可追溯性降低。小 RNA (sRNA) 通常映射到多个特征,并且在形式和功能上表现出令人难以置信的多样性。因此,应用基于特征的策略可能会增加误解的风险。我们提出了一种用于 sRNA-seq 分析的策略,该策略保留了原始序列的完整性,使数据谱系完全可追溯。我们已经将该策略整合到 Seqpac 中:一个可在多个平台上提供完整 sRNA 分析的 R 包。使用已发表的生物学数据,我们表明 Seqpac 揭示了隐藏的偏差,并为以前使用基于特征的计数进行分析的研究提供了新的见解。我们已经确定了在同时分析 RNA-seq 数据时存在的局限性。我们称之为基于比对测序策略的可追溯性困境。通过构建一个在整个分析过程中保留读序列完整性的灵活框架,我们证明了 sRNA-seq 实验具有更好的可解释性,而 sRNA-seq 实验特别容易受到这个问题的影响。将类似的策略应用于其他转录组工作流程可能有助于解决许多依赖转录组分析的领域所经历的复制危机。
Seqpac 可在 Bioconductor(https://bioconductor.org/packages/seqpac)和 GitHub(https://github.com/danis102/seqpac)上获得。