Brendel V, Kleffe J
Department of Mathematics, Stanford University, Stanford, CA 94305, USA.
Nucleic Acids Res. 1998 Oct 15;26(20):4748-57. doi: 10.1093/nar/26.20.4748.
Prediction of splice site selection and efficiency from sequence inspection is of fundamental interest (testing the current knowledge of requisite sequence features) and practical importance (genome annotation, design of mutant or transgenic organisms). In plants, the dominant variables affecting splice site selection and efficiency include the degree of matching to the extended splice site consensus and the local gradient of U- and G+C-composition (introns being U-rich and exons G+C-rich). We present a novel method for splice site prediction, which was particularly trained for maize and Arabidopsis thaliana. The method extends our previous algorithm based on logitlinear models by considering three variables simultaneously: intrinsic splice site strength, local optimality and fit with respect to the overall splice pattern prediction. We show that the method considerably improves prediction specificity without compromising the high degree of sensitivity required in gene prediction algorithms. Applications to gene identification are illustrated for Arabidopsis and suggest that successful methods must combine scoring for splice sites, coding potential and similarity with potential homologs in non-trivial ways. A WWW version of the SplicePredictor program is available at http:/gnomic.stanford.edu/volker/SplicePredi ctor.html/
通过序列检查预测剪接位点的选择和效率具有根本重要性(检验当前关于必需序列特征的知识)和实际意义(基因组注释、突变体或转基因生物的设计)。在植物中,影响剪接位点选择和效率的主要变量包括与扩展剪接位点共有序列的匹配程度以及U和G+C组成的局部梯度(内含子富含U,外显子富含G+C)。我们提出了一种新的剪接位点预测方法,该方法是专门针对玉米和拟南芥进行训练的。该方法通过同时考虑三个变量扩展了我们先前基于对数线性模型的算法:内在剪接位点强度、局部最优性以及与整体剪接模式预测的契合度。我们表明,该方法在不影响基因预测算法所需的高灵敏度的情况下,显著提高了预测特异性。对拟南芥基因鉴定的应用表明,成功的方法必须以非平凡的方式结合剪接位点评分、编码潜力和与潜在同源物的相似性。SplicePredictor程序的万维网版本可在http:/gnomic.stanford.edu/volker/SplicePredictor.html/获取