Department of Computer Science, University of Central Florida, 4000 Central Florida Blvd, Orlando, FL, 32816, USA.
Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota Twin Cities, 420 Washington Ave. S.E., Minneapolis, MN, 55455, USA.
BMC Bioinformatics. 2022 Sep 28;23(Suppl 3):396. doi: 10.1186/s12859-022-04939-w.
The eukaryotic genome is capable of producing multiple isoforms from a gene by alternative polyadenylation (APA) during pre-mRNA processing. APA in the 3'-untranslated region (3'-UTR) of mRNA produces transcripts with shorter or longer 3'-UTR. Often, 3'-UTR serves as a binding platform for microRNAs and RNA-binding proteins, which affect the fate of the mRNA transcript. Thus, 3'-UTR APA is known to modulate translation and provides a mean to regulate gene expression at the post-transcriptional level. Current bioinformatics pipelines have limited capability in profiling 3'-UTR APA events due to incomplete annotations and a low-resolution analyzing power: widely available bioinformatics pipelines do not reference actionable polyadenylation (cleavage) sites but simulate 3'-UTR APA only using RNA-seq read coverage, causing false positive identifications. To overcome these limitations, we developed APA-Scan, a robust program that identifies 3'-UTR APA events and visualizes the RNA-seq short-read coverage with gene annotations.
APA-Scan utilizes either predicted or experimentally validated actionable polyadenylation signals as a reference for polyadenylation sites and calculates the quantity of long and short 3'-UTR transcripts in the RNA-seq data. APA-Scan works in three major steps: (i) calculate the read coverage of the 3'-UTR regions of genes; (ii) identify the potential APA sites and evaluate the significance of the events among two biological conditions; (iii) graphical representation of user specific event with 3'-UTR annotation and read coverage on the 3'-UTR regions. APA-Scan is implemented in Python3. Source code and a comprehensive user's manual are freely available at https://github.com/compbiolabucf/APA-Scan .
APA-Scan was applied to both simulated and real RNA-seq datasets and compared with two widely used baselines DaPars and APAtrap. In simulation APA-Scan significantly improved the accuracy of 3'-UTR APA identification compared to the other baselines. The performance of APA-Scan was also validated by 3'-end-seq data and qPCR on mouse embryonic fibroblast cells. The experiments confirm that APA-Scan can detect unannotated 3'-UTR APA events and improve genome annotation.
APA-Scan is a comprehensive computational pipeline to detect transcriptome-wide 3'-UTR APA events. The pipeline integrates both RNA-seq and 3'-end-seq data information and can efficiently identify the significant events with a high-resolution short reads coverage plots.
真核基因组通过在 mRNA 前体加工过程中的可变多聚腺苷酸化(APA)能够从一个基因产生多个异构体。mRNA 3'非翻译区(3'UTR)中的 APA 产生具有更短或更长 3'UTR 的转录本。通常,3'UTR 充当 microRNA 和 RNA 结合蛋白的结合平台,这些蛋白影响 mRNA 转录本的命运。因此,已知 3'UTR APA 可调节翻译,并提供在转录后水平调节基因表达的方法。由于不完全注释和低分辨率分析能力,当前的生物信息学管道在分析 3'UTR APA 事件方面的能力有限:广泛可用的生物信息学管道不参考可操作的多聚腺苷酸化(切割)位点,而是仅使用 RNA-seq 读段覆盖模拟 3'UTR APA,导致假阳性鉴定。为了克服这些限制,我们开发了 APA-Scan,这是一种强大的程序,可识别 3'UTR APA 事件并使用基因注释可视化 RNA-seq 短读段覆盖。
APA-Scan 利用预测或实验验证的可操作的多聚腺苷酸化信号作为多聚腺苷酸化位点的参考,并计算 RNA-seq 数据中长和短 3'UTR 转录物的数量。APA-Scan 主要通过以下三个步骤工作:(i)计算基因 3'UTR 区域的读段覆盖;(ii)识别潜在的 APA 位点,并在两种生物条件下评估事件的显著性;(iii)使用 3'UTR 注释和 3'UTR 区域的读段覆盖图对用户特定事件进行图形表示。APA-Scan 是用 Python3 实现的。源代码和综合用户手册可在 https://github.com/compbiolabucf/APA-Scan 上免费获得。
APA-Scan 应用于模拟和真实的 RNA-seq 数据集,并与两种广泛使用的基线 DaPars 和 APAtrap 进行了比较。在模拟中,与其他基线相比,APA-Scan 显著提高了 3'UTR APA 识别的准确性。在小鼠胚胎成纤维细胞中的 3'末端测序数据和 qPCR 实验也验证了 APA-Scan 的性能。这些实验证实,APA-Scan 可以检测未注释的 3'UTR APA 事件并改进基因组注释。
APA-Scan 是一种全面的计算管道,用于检测全转录组 3'UTR APA 事件。该管道集成了 RNA-seq 和 3'末端测序数据信息,能够有效地识别具有高分辨率短读段覆盖图的显著事件。