Cui Xiaodong, Meng Jia, Zhang Shaowu, Chen Yidong, Huang Yufei
Department of Electrical and Computer Engineering, University of Texas at San Antonio, TX 78249, USA.
Department of Biological Science, Xi'an Jiaotong-Liverpool University, Suzhou 215123, China.
Bioinformatics. 2016 Jun 15;32(12):i378-i385. doi: 10.1093/bioinformatics/btw281.
N(6)-methyl-adenosine (m(6)A) is the most prevalent mRNA methylation but precise prediction of its mRNA location is important for understanding its function. A recent sequencing technology, known as Methylated RNA Immunoprecipitation Sequencing technology (MeRIP-seq), has been developed for transcriptome-wide profiling of m(6)A. We previously developed a peak calling algorithm called exomePeak. However, exomePeak over-simplifies data characteristics and ignores the reads' variances among replicates or reads dependency across a site region. To further improve the performance, new model is needed to address these important issues of MeRIP-seq data.
We propose a novel, graphical model-based peak calling method, MeTPeak, for transcriptome-wide detection of m(6)A sites from MeRIP-seq data. MeTPeak explicitly models read count of an m(6)A site and introduces a hierarchical layer of Beta variables to capture the variances and a Hidden Markov model to characterize the reads dependency across a site. In addition, we developed a constrained Newton's method and designed a log-barrier function to compute analytically intractable, positively constrained Beta parameters. We applied our algorithm to simulated and real biological datasets and demonstrated significant improvement in detection performance and robustness over exomePeak. Prediction results on publicly available MeRIP-seq datasets are also validated and shown to be able to recapitulate the known patterns of m(6)A, further validating the improved performance of MeTPeak.
The package 'MeTPeak' is implemented in R and C ++, and additional details are available at https://github.com/compgenomics/MeTPeak
yufei.huang@utsa.edu or xdchoi@gmail.com
Supplementary data are available at Bioinformatics online.
N(6)-甲基腺苷(m(6)A)是最普遍的mRNA甲基化修饰,但精确预测其在mRNA上的位置对于理解其功能至关重要。最近开发了一种称为甲基化RNA免疫沉淀测序技术(MeRIP-seq)的测序技术,用于对m(6)A进行全转录组分析。我们之前开发了一种名为exomePeak的峰检测算法。然而,exomePeak过度简化了数据特征,忽略了重复样本间读数的差异或位点区域内读数的依赖性。为了进一步提高性能,需要新的模型来解决MeRIP-seq数据的这些重要问题。
我们提出了一种新颖的基于图形模型的峰检测方法MeTPeak,用于从MeRIP-seq数据中进行全转录组范围的m(6)A位点检测。MeTPeak明确地对m(6)A位点的读数计数进行建模,并引入了一层Beta变量的层次结构来捕获差异,以及一个隐马尔可夫模型来表征位点内读数的依赖性。此外,我们开发了一种约束牛顿法,并设计了一个对数障碍函数来计算难以解析的、正约束的Beta参数。我们将我们的算法应用于模拟和真实的生物学数据集,并证明在检测性能和稳健性方面比exomePeak有显著提高。对公开可用的MeRIP-seq数据集的预测结果也得到了验证,并显示能够重现已知的m(6)A模式,进一步验证了MeTPeak的改进性能。
“MeTPeak”软件包用R和C++实现,更多详细信息可在https://github.com/compgenomics/MeTPeak获取。
yufei.huang@utsa.edu或xdchoi@gmail.com
补充数据可在《生物信息学》在线获取。