Gelfand M S
Institute of Protein Research, USSR Academy of Sciences, Pushchino, Moscow region.
Nucleic Acids Res. 1990 Oct 11;18(19):5865-9. doi: 10.1093/nar/18.19.5865.
A novel approach to the problem of prediction of protein-coding regions is suggested. This approach combines the site prediction methods to predict splicing sites and the global coding region prediction methods to choose the best variant of spliced mRNA. One of the advantages of the suggested algorithm is that the resulting mRNA or protein sequence may then be immediately analyzed further. The true mRNA either coincides with the predicted one or ranks high in the list of variants. In the latter situation the predicted mRNA usually differs from the true one in only one or two of several exons. The combined approach allows the use of a priori information (e.g. the putative protein length or the number of exons). It is possible to use additional parameters not considered here, such as the preferred lengths of exons and introns, and particularly the preferred position of introns in the reading frame and the preferred codon position of exon termini.
提出了一种预测蛋白质编码区问题的新方法。该方法结合了用于预测剪接位点的位点预测方法和用于选择剪接后mRNA最佳变体的全局编码区预测方法。所提出算法的优点之一是,随后可以立即对所得的mRNA或蛋白质序列进行进一步分析。真实的mRNA要么与预测的mRNA一致,要么在变体列表中排名靠前。在后一种情况下,预测的mRNA通常与真实的mRNA仅在几个外显子中的一两个上有所不同。这种组合方法允许使用先验信息(例如假定的蛋白质长度或外显子数量)。也可以使用此处未考虑的其他参数,例如外显子和内含子的优选长度,特别是内含子在阅读框中的优选位置以及外显子末端的优选密码子位置。