Shapiro M B, Senapathy P
Laboratory of Statistical and Mathematical Methodology, National Institutes of Health, Bethesda, MD 20892.
Nucleic Acids Res. 1987 Sep 11;15(17):7155-74. doi: 10.1093/nar/15.17.7155.
A systematic analysis of the RNA splice junction sequences of eukaryotic protein coding genes was carried out using the GENBANK databank. Nucleotide frequencies obtained for the highly conserved regions around the splice sites for different categories of organisms closely agree with each other. A striking similarity among the rare splice junctions which do not contain AG at the 3' splice site or GT at the 5' splice site indicates the existence of special mechanisms to recognize them, and that these unique signals may be involved in crucial gene-regulation events and in differentiation. A method was developed to predict potential exons in a bare sequence, using a scoring and ranking scheme based on nucleotide weight tables. This method was used to find a majority of the exons in selected known genes, and also predicted potential new exons which may be used in alternative splicing situations.
利用GENBANK数据库对真核生物蛋白质编码基因的RNA剪接连接序列进行了系统分析。不同类别生物体剪接位点周围高度保守区域的核苷酸频率彼此非常吻合。在3'剪接位点不含AG或5'剪接位点不含GT的罕见剪接连接之间存在显著相似性,这表明存在识别它们的特殊机制,并且这些独特信号可能参与关键的基因调控事件和分化过程。开发了一种基于核苷酸权重表的评分和排序方案来预测裸序列中潜在外显子的方法。该方法用于在选定的已知基因中找到大多数外显子,还预测了可能用于可变剪接情况的潜在新外显子。