Zhou Yuan, Zeng Pan, Li Yan-Hui, Zhang Ziding, Cui Qinghua
Department of Biomedical Informatics, School of Basic Medical Sciences, Peking University, Beijing 100191, China MOE Key Lab of Molecular Cardiovascular Sciences, Peking University, Beijing 100191, China Center for Noncoding RNA Medicine, Peking University Health Science Center, Beijing 100191, China
Department of Biomedical Informatics, School of Basic Medical Sciences, Peking University, Beijing 100191, China MOE Key Lab of Molecular Cardiovascular Sciences, Peking University, Beijing 100191, China Center for Noncoding RNA Medicine, Peking University Health Science Center, Beijing 100191, China.
Nucleic Acids Res. 2016 Jun 2;44(10):e91. doi: 10.1093/nar/gkw104. Epub 2016 Feb 20.
N(6)-methyladenosine (m(6)A) is a prevalent RNA methylation modification involved in the regulation of degradation, subcellular localization, splicing and local conformation changes of RNA transcripts. High-throughput experiments have demonstrated that only a small fraction of the m(6)A consensus motifs in mammalian transcriptomes are modified. Therefore, accurate identification of RNA m(6)A sites becomes emergently important. For the above purpose, here a computational predictor of mammalian m(6)A site named SRAMP is established. To depict the sequence context around m(6)A sites, SRAMP combines three random forest classifiers that exploit the positional nucleotide sequence pattern, the K-nearest neighbor information and the position-independent nucleotide pair spectrum features, respectively. SRAMP uses either genomic sequences or cDNA sequences as its input. With either kind of input sequence, SRAMP achieves competitive performance in both cross-validation tests and rigorous independent benchmarking tests. Analyses of the informative features and overrepresented rules extracted from the random forest classifiers demonstrate that nucleotide usage preferences at the distal positions, in addition to those at the proximal positions, contribute to the classification. As a public prediction server, SRAMP is freely available at http://www.cuilab.cn/sramp/.
N6-甲基腺苷(m6A)是一种普遍存在的RNA甲基化修饰,参与RNA转录本的降解、亚细胞定位、剪接和局部构象变化的调控。高通量实验表明,哺乳动物转录组中只有一小部分m6A共有基序被修饰。因此,准确识别RNA m6A位点变得尤为重要。为了实现上述目的,本文建立了一种名为SRAMP的哺乳动物m6A位点计算预测工具。为了描绘m6A位点周围的序列背景,SRAMP结合了三个随机森林分类器,分别利用位置核苷酸序列模式、K近邻信息和位置独立核苷酸对谱特征。SRAMP使用基因组序列或cDNA序列作为输入。无论使用哪种输入序列,SRAMP在交叉验证测试和严格的独立基准测试中都取得了具有竞争力的性能。对从随机森林分类器中提取的信息特征和过度代表性规则的分析表明,除了近端位置的核苷酸使用偏好外,远端位置的核苷酸使用偏好也有助于分类。作为一个公共预测服务器,SRAMP可在http://www.cuilab.cn/sramp/免费获取。