Key Laboratory of the Ministry of Education for Coastal and Wetland Ecosystems, College of the Environment and Ecology, Xiamen University, Xiamen, Fujian 361102, China.
Department of Automation, Xiamen University, Xiamen, Fujian 361005, China.
Bioinformatics. 2018 Jun 1;34(11):1841-1849. doi: 10.1093/bioinformatics/bty029.
Alternative polyadenylation (APA) has been increasingly recognized as a crucial mechanism that contributes to transcriptome diversity and gene expression regulation. As RNA-seq has become a routine protocol for transcriptome analysis, it is of great interest to leverage such unprecedented collection of RNA-seq data by new computational methods to extract and quantify APA dynamics in these transcriptomes. However, research progress in this area has been relatively limited. Conventional methods rely on either transcript assembly to determine transcript 3' ends or annotated poly(A) sites. Moreover, they can neither identify more than two poly(A) sites in a gene nor detect dynamic APA site usage considering more than two poly(A) sites.
We developed an approach called APAtrap based on the mean squared error model to identify and quantify APA sites from RNA-seq data. APAtrap is capable of identifying novel 3' UTRs and 3' UTR extensions, which contributes to locating potential poly(A) sites in previously overlooked regions and improving genome annotations. APAtrap also aims to tally all potential poly(A) sites and detect genes with differential APA site usages between conditions. Extensive comparisons of APAtrap with two other latest methods, ChangePoint and DaPars, using various RNA-seq datasets from simulation studies, human and Arabidopsis demonstrate the efficacy and flexibility of APAtrap for any organisms with an annotated genome.
Freely available for download at https://apatrap.sourceforge.io.
liqq@xmu.edu.cn or xhuister@xmu.edu.cn.
Supplementary data are available at Bioinformatics online.
可变多聚腺苷酸化 (APA) 已逐渐被认为是导致转录组多样性和基因表达调控的关键机制。随着 RNA-seq 已成为转录组分析的常规方案,利用这些前所未有的 RNA-seq 数据通过新的计算方法来提取和量化这些转录本中的 APA 动态,这一点非常有趣。然而,该领域的研究进展相对有限。传统方法要么依赖于转录本组装来确定转录本 3' 端,要么依赖于已注释的 poly(A) 位点。此外,它们既不能在一个基因中识别超过两个 poly(A) 位点,也不能考虑超过两个 poly(A) 位点来检测动态 APA 位点使用情况。
我们开发了一种称为 APAtrap 的方法,该方法基于均方误差模型,从 RNA-seq 数据中识别和量化 APA 位点。APAtrap 能够识别新的 3'UTR 和 3'UTR 延伸,有助于在以前被忽视的区域定位潜在的 poly(A) 位点,并改进基因组注释。APAtrap 还旨在统计所有潜在的 poly(A) 位点,并检测条件之间具有不同 APA 位点使用情况的基因。通过使用来自模拟研究、人类和拟南芥的各种 RNA-seq 数据集,对 APAtrap 与其他两种最新方法 ChangePoint 和 DaPars 进行广泛比较,证明了 APAtrap 对于具有注释基因组的任何生物体的有效性和灵活性。
可在 https://apatrap.sourceforge.io 免费下载。
liqq@xmu.edu.cn 或 xhuister@xmu.edu.cn。
补充数据可在 Bioinformatics 在线获取。