Xia Huiyu, Bi Jianning, Li Yanda
Bioinformatics Division, TNLIST and Department of Automation, Tsinghua University, Beijing 100084, China.
Nucleic Acids Res. 2006;34(21):6305-13. doi: 10.1093/nar/gkl900. Epub 2006 Nov 10.
Alternative splicing plays an important role in regulating gene expression. Currently, most efficient methods use expressed sequence tags or microarray analysis for large-scale detection of alternative splicing. However, it is difficult to detect all alternative splice events with them because of their inherent limitations. Previous computational methods for alternative splicing prediction could only predict particular kinds of alternative splice events. Thus, it would be highly desirable to predict alternative 5'/3' splice sites with various splicing levels using genomic sequences alone. Here, we introduce the competition mechanism of splice sites selection into alternative splice site prediction. This approach allows us to predict not only rarely used but also frequently used alternative splice sites. On a dataset extracted from the AltSplice database, our method correctly classified approximately 70% of the splice sites into alternative and constitutive, as well as approximately 80% of the locations of real competitors for alternative splice sites. It outperforms a method which only considers features extracted from the splice sites themselves. Furthermore, this approach can also predict the changes in activation level arising from mutations in flanking cryptic splice sites of a given splice site. Our approach might be useful for studying alternative splicing in both computational and molecular biology.
可变剪接在调控基因表达中发挥着重要作用。目前,大多数高效方法使用表达序列标签或微阵列分析来大规模检测可变剪接。然而,由于其固有的局限性,很难用它们检测到所有的可变剪接事件。以前用于可变剪接预测的计算方法只能预测特定类型的可变剪接事件。因此,非常希望仅使用基因组序列来预测具有不同剪接水平的可变5'/3'剪接位点。在这里,我们将剪接位点选择的竞争机制引入可变剪接位点预测中。这种方法不仅能让我们预测很少使用的可变剪接位点,还能预测频繁使用的可变剪接位点。在从AltSplice数据库提取的数据集上,我们的方法能将大约70%的剪接位点正确分类为可变剪接和组成型剪接,以及大约80%的可变剪接位点的实际竞争位置。它优于一种只考虑从剪接位点本身提取的特征的方法。此外,这种方法还可以预测给定剪接位点侧翼隐蔽剪接位点突变引起的激活水平变化。我们的方法可能对计算生物学和分子生物学中的可变剪接研究有用。