基于计算机模拟混合物对长读 RNA 测序分析工具进行基准测试。
Benchmarking long-read RNA-sequencing analysis tools using in silico mixtures.
机构信息
The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia.
Department of Medical Biology, The University of Melbourne, Parkville, Victoria, Australia.
出版信息
Nat Methods. 2023 Nov;20(11):1810-1821. doi: 10.1038/s41592-023-02026-3. Epub 2023 Oct 2.
The lack of benchmark data sets with inbuilt ground-truth makes it challenging to compare the performance of existing long-read isoform detection and differential expression analysis workflows. Here, we present a benchmark experiment using two human lung adenocarcinoma cell lines that were each profiled in triplicate together with synthetic, spliced, spike-in RNAs (sequins). Samples were deeply sequenced on both Illumina short-read and Oxford Nanopore Technologies long-read platforms. Alongside the ground-truth available via the sequins, we created in silico mixture samples to allow performance assessment in the absence of true positives or true negatives. Our results show that StringTie2 and bambu outperformed other tools from the six isoform detection tools tested, DESeq2, edgeR and limma-voom were best among the five differential transcript expression tools tested and there was no clear front-runner for performing differential transcript usage analysis between the five tools compared, which suggests further methods development is needed for this application.
缺乏具有内置真实数据的基准数据集使得比较现有的长读长片段检测和差异表达分析工作流程的性能变得具有挑战性。在这里,我们使用两个人类肺腺癌细胞系进行了基准实验,每个细胞系均进行了三次重复 profiling,同时还使用了合成的、拼接的、 Spike-in RNA(Sequins)。样品在 Illumina 短读长和 Oxford Nanopore Technologies 长读长平台上进行了深度测序。除了通过 Sequins 获得的真实数据之外,我们还创建了虚拟混合样本,以在没有真正的阳性或阴性对照的情况下进行性能评估。我们的结果表明,StringTie2 和 bambu 在六种检测工具中表现优于其他工具,DESeq2、edgeR 和 limma-voom 在五种差异转录表达工具中表现最好,在五种比较工具中进行差异转录物使用分析方面没有明显的领先者,这表明该应用程序需要进一步的方法开发。