Wu Xiaohui, Ji Guoli, Li Qingshun Quinn
Department of Automation, Xiamen University, 422 Siming South Road, Xiamen, Fujian, 361005, China,
Methods Mol Biol. 2015;1255:13-23. doi: 10.1007/978-1-4939-2175-1_2.
Messenger RNA polyadenylation is one of the essential processing steps during eukaryotic gene expression. The site of polyadenylation [poly(A) site] marks the end of a transcript, which is also the end of a gene in most cases. A computation program that is able to recognize poly(A) sites would not only be useful for genome annotation in finding genes ends, but also for predicting alternative poly(A) sites. PASS [Poly(A) Site Sleuth] and PAC [Poly(A) site Classifier] were developed to predict poly(A) sites in plants. PASS was built based on the Generalized Hidden Markov Model (GHMM), which consists of four functional modules: input model, poly(A) site recognition module, graphic process module, and output module. PAC is a classification model, integrating several features that define the poly(A) sites including K-gram pattern, Z-curve, position-specific scoring matrix, and first-order inhomogeneous Markov sub-model. PAC can be used to predict poly(A) sites from species whose polyadenylation profile is unknown. The result of PASS and PAC is an output of a few files with one of them containing the score or probability of being a poly(A) site for each position of a given sequence. While the models were built mostly based on poly(A) profile data from Arabidopsis, it is also functional in other higher plants since their profiles are quite similar.
信使核糖核酸聚腺苷酸化是真核基因表达过程中必不可少的加工步骤之一。聚腺苷酸化位点[poly(A)位点]标志着转录本的末端,在大多数情况下这也是一个基因的末端。一个能够识别poly(A)位点的计算程序不仅有助于在寻找基因末端时进行基因组注释,还能用于预测可变poly(A)位点。PASS[聚(A)位点搜寻器]和PAC[聚(A)位点分类器]是用于预测植物中poly(A)位点的工具。PASS基于广义隐马尔可夫模型(GHMM)构建,该模型由四个功能模块组成:输入模型、poly(A)位点识别模块、图形处理模块和输出模块。PAC是一个分类模型,整合了多种定义poly(A)位点的特征,包括K-gram模式、Z曲线、位置特异性评分矩阵和一阶非齐次马尔可夫子模型。PAC可用于预测来自聚腺苷酸化谱未知物种的poly(A)位点。PASS和PAC的结果是输出几个文件,其中一个文件包含给定序列每个位置成为poly(A)位点的得分或概率。虽然这些模型主要基于拟南芥的poly(A)谱数据构建,但由于其他高等植物的poly(A)谱非常相似,所以该模型在其他高等植物中也同样适用。