Department of Computer Science, Dokuz Eylul University, Izmir, Turkey. efendi
Comput Biol Chem. 2010 Dec;34(5-6):293-9. doi: 10.1016/j.compbiolchem.2010.10.003. Epub 2010 Oct 14.
The prediction of the complete structure of genes is one of the very important tasks of bioinformatics, especially in eukaryotes. A crucial part in the gene structure prediction is to determine the splice sites in the coding region. Identification of splice sites depends on the precise recognition of the boundaries between exons and introns of a given DNA sequence. This problem can be formulated as a classification of sequence elements into 'exon-intron' (EI), 'intron-exon' (IE) or 'None' (N) boundary classes. In this study we propose a new Weighted Position Specific Scoring Method (WPSSM) to recognize splice sites which uses a position-specific scoring matrix constructed by nucleotide base frequencies. A genetic algorithm is used in order to tune the weight and threshold parameters of the positions on. This method consists of two phases: learning phase and identification phase. The proposed WPSS method poses efficient results compared with the performance of many methods proposed in the literature. Computational experiments are performed on the DNA sequence datasets from 'UCI Repository of machine learning databases'.
基因完整结构的预测是生物信息学的非常重要任务之一,尤其是在真核生物中。基因结构预测的一个关键部分是确定编码区中的剪接位点。剪接位点的识别取决于对特定 DNA 序列中外显子和内含子之间边界的精确识别。这个问题可以被表述为将序列元素分类为“外显子-内含子”(EI)、“内含子-外显子”(IE)或“无”(N)边界类。在这项研究中,我们提出了一种新的加权位置特异性评分方法(WPSSM)来识别剪接位点,该方法使用由核苷酸碱基频率构建的位置特异性评分矩阵。为了调整位置的权重和阈值参数,使用了遗传算法。该方法包括两个阶段:学习阶段和识别阶段。与文献中提出的许多方法的性能相比,所提出的 WPSS 方法具有高效的结果。在来自“UCI 机器学习数据库存储库”的 DNA 序列数据集上进行了计算实验。