Ji Hyun Joo, Pertea Mihaela
Center for Computational Biology, Johns Hopkins University; Baltimore, MD.
Department of Computer Science, Johns Hopkins University; Baltimore, MD.
bioRxiv. 2024 Aug 17:2024.04.13.589356. doi: 10.1101/2024.04.13.589356.
Recently developed long-read RNA sequencing technologies promise to provide a more accurate and comprehensive view of transcriptomes compared to short-read sequencers, primarily due to their capability to achieve full-length sequencing of transcripts. However, realizing this potential requires computational tools tailored to process long reads, which exhibit a higher error rate than short reads. Existing methods for assembling and quantifying long-read data often disagree on expressed transcripts and their abundance levels, leading researchers to lack confidence in the transcriptomes produced using this data. One approach to address the uncertainties in transcriptome assembly and quantification is by assigning the long reads to transcripts, enabling a more detailed characterization of transcript support at the read level. Here, we introduce TranSigner, a versatile tool that assigns long reads to any input transcriptome. TranSigner consists of three consecutive modules performing: read alignment to the given transcripts, computation of read-to-transcript compatibility based on alignment scores and positions, and execution of an expectation-maximization algorithm to probabilistically assign reads to transcripts and estimate transcript abundances. Using simulated data and experimental datasets from three well-studied organisms - , and - we demonstrate that TranSigner achieves accurate read assignments, obtaining higher accuracy in transcript abundance estimation compared to existing tools.
与短读长测序仪相比,最近开发的长读长RNA测序技术有望提供更准确、更全面的转录组视图,这主要归功于它们能够实现转录本的全长测序。然而,要实现这一潜力,需要有专门用于处理长读长的计算工具,因为长读长的错误率比短读长高。现有的长读长数据组装和定量方法在表达的转录本及其丰度水平上常常存在分歧,这使得研究人员对使用这些数据生成的转录组缺乏信心。解决转录组组装和定量不确定性的一种方法是将长读长分配到转录本上,从而能够在读取水平上更详细地描述转录本支持情况。在此,我们介绍TranSigner,这是一种通用工具,可将长读长分配到任何输入的转录组。TranSigner由三个连续的模块组成,分别执行:将读取与给定的转录本进行比对、根据比对分数和位置计算读取与转录本的兼容性,以及执行期望最大化算法以概率方式将读取分配到转录本并估计转录本丰度。使用来自三种深入研究的生物——、和——的模拟数据和实验数据集,我们证明TranSigner实现了准确的读取分配,与现有工具相比,在转录本丰度估计方面获得了更高的准确性。