Department of Clinical Chemistry, Amsterdam UMC - location VUmc, Amsterdam, The Netherlands.
ORTEC Netherlands, Zoetermeer, The Netherlands.
BMC Bioinformatics. 2021 Jun 26;22(1):347. doi: 10.1186/s12859-021-04263-9.
Computational tools analyzing RNA-sequencing data have boosted alternative splicing research by identifying and assessing differentially spliced genes. However, common alternative splicing analysis tools differ substantially in their statistical analyses and general performance. This report compares the computational performance (CPU utilization and RAM usage) of three event-level splicing tools; rMATS, MISO, and SUPPA2. Additionally, concordance between tool outputs was investigated.
Log-linear relations were found between job times and dataset size in all splicing tools and all virtual machine (VM) configurations. MISO had the highest job times for all analyses, irrespective of VM size, while MISO analyses also exceeded maximum CPU utilization on all VM sizes. rMATS and SUPPA2 load averages were relatively low in both size and replicate comparisons, not nearing maximum CPU utilization in the VM simulating the lowest computational power (D2 VM). RAM usage in rMATS and SUPPA2 did not exceed 20% of maximum RAM in both size and replicate comparisons while MISO reached maximum RAM usage in D2 VM analyses for input size. Correlation coefficients of differential splicing analyses showed high correlation (β > 80%) between different tool outputs with the exception of comparisons of retained intron (RI) events between rMATS/MISO and rMATS/SUPPA2 (β < 60%).
Prior to RNA-seq analyses, users should consider job time, amount of replicates and splice event type of interest to determine the optimal alternative splicing tool. In general, rMATS is superior to both MISO and SUPPA2 in computational performance. Analysis outputs show high concordance between tools, with the exception of RI events.
分析 RNA 测序数据的计算工具通过识别和评估差异剪接基因,推动了可变剪接研究。然而,常见的可变剪接分析工具在其统计分析和总体性能上有很大的不同。本报告比较了三种事件级剪接工具;rMATS、MISO 和 SUPPA2 的计算性能(CPU 利用率和 RAM 使用情况)。此外,还研究了工具输出的一致性。
在所有剪接工具和所有虚拟机(VM)配置中,作业时间与数据集大小之间都存在对数线性关系。无论 VM 大小如何,MISO 的作业时间都最高,而 MISO 分析在所有 VM 大小上也超过了最大 CPU 利用率。rMATS 和 SUPPA2 的平均负载在大小和重复比较中都相对较低,在模拟最低计算能力的 VM(D2 VM)中,没有接近最大 CPU 利用率。rMATS 和 SUPPA2 的 RAM 使用情况在大小和重复比较中均未超过最大 RAM 的 20%,而 MISO 在 D2 VM 分析中输入大小达到了最大 RAM 使用量。差异剪接分析的相关系数显示,不同工具输出之间具有高度相关性(β>80%),除了 rMATS/MISO 和 rMATS/SUPPA2 之间的保留内含子(RI)事件比较(β<60%)之外。
在进行 RNA-seq 分析之前,用户应根据作业时间、重复次数和感兴趣的剪接事件类型来确定最佳的可变剪接工具。一般来说,rMATS 在计算性能方面优于 MISO 和 SUPPA2。分析输出显示工具之间具有高度一致性,除了 RI 事件。