Wissel David, Mehlferber Madison M, Nguyen Khue M, Pavelko Vasilii, Tseng Elizabeth, Robinson Mark D, Sheynkman Gloria M
Department of Molecular Life Sciences, University of Zurich, Zurich, Switzerland.
Department of Computer Science, ETH Zurich, Zurich, Switzerland.
bioRxiv. 2025 Jun 2:2025.05.30.656561. doi: 10.1101/2025.05.30.656561.
Long-read RNA-sequencing enables the profiling of full-length transcripts, but its quantification accuracy data has still not been robustly established. This is especially true for PacBio lrRNA-seq data, which were previously only available at low to moderate depth. Using a high-depth PacBio Kinnex lrRNAseq dataset, sample-matched with Illumina short-read RNA-seq, we performed rigorous benchmarking to characterize quantification accuracy between platforms on a dataset representing differentiation of induced pluripotent stem cells into primordial endothelial cells. We identified biases impacting transcript quantification, including inferential variability within Illumina data, which can bias transcript abundance estimates for genes with complex splicing, as well as length biases in Kinnex data. Overall, PacBio and Illumina quantifications were strongly concordant, supporting that PacBio Kinnex is a reliable method for transcriptome profiling and enabling downstream biological analyses.
长读长RNA测序能够对全长转录本进行分析,但其定量准确性数据仍未得到充分确立。对于PacBio长读长RNA测序(lrRNA-seq)数据而言尤其如此,此前这些数据仅能在低至中等深度下获得。我们使用了一个与Illumina短读长RNA测序样本匹配的高深度PacBio Kinnex lrRNAseq数据集,在一个代表诱导多能干细胞分化为原始内皮细胞的数据集上进行了严格的基准测试,以表征不同平台之间的定量准确性。我们识别出了影响转录本定量的偏差,包括Illumina数据中的推断变异性,这可能会使具有复杂剪接的基因的转录本丰度估计产生偏差,以及Kinnex数据中的长度偏差。总体而言,PacBio和Illumina的定量结果高度一致,这支持了PacBio Kinnex是一种用于转录组分析的可靠方法,并能够进行下游生物学分析。