Tolstrup N, Rouzé P, Brunak S
Center for Biological Sequence Analysis, Department of Chemistry, The Technical University of Denmark, Building 206, DK-2800 Lyngby, Denmark.
Nucleic Acids Res. 1997 Aug 1;25(15):3159-63. doi: 10.1093/nar/25.15.3159.
Little knowledge exists about branch points in plants; it has even been claimed that plant introns lack conserved branch point sequences similar to those found in vertebrate introns. A putative branch point consensus sequence for Arabidopsis thaliana resembling the well known metazoan consensus sequence has been proposed, but this is based on search of sequences similar to those in yeast and metazoa. Here we present a novel consensus sequence found by a non-circular approach. A hidden Markov model with a fixed A nucleotide was trained on sequences upstream of the acceptor site. The consensus found by the Markov model shares features with the metazoan consensus, but differs in its details from the consensus proposed earlier. Despite the fact that branch point consensus sequences in plants are weak, we show that a prediction scheme incorporating them leads to a substantial improvement in the recognition of true acceptor sites; the false positive rate being reduced by a factor of 2. We take this as an indication that the consensus found here is the genuine one and that the branch point does play a role in the proper recognition of the acceptor site in plants.
关于植物中的分支点,人们了解甚少;甚至有人声称植物内含子缺乏类似于脊椎动物内含子中发现的保守分支点序列。有人提出了拟南芥的一个推定分支点共有序列,它类似于著名的后生动物共有序列,但这是基于对与酵母和后生动物中序列相似的序列进行搜索得出的。在这里,我们展示了一种通过非循环方法发现的新共有序列。在受体位点上游的序列上训练了一个固定为A核苷酸的隐马尔可夫模型。马尔可夫模型找到的共有序列与后生动物共有序列有共同特征,但其细节与先前提出的共有序列不同。尽管植物中的分支点共有序列不太明显,但我们表明,纳入这些序列的预测方案能显著提高对真正受体位点的识别;误报率降低了一半。我们认为这表明这里找到的共有序列是真实的,并且分支点在植物中受体位点的正确识别中确实发挥了作用。