School of Pharmacy, Guangdong Pharmaceutical University, Guangzhou 510006, PR China.
J Theor Biol. 2013 Feb 21;319:1-7. doi: 10.1016/j.jtbi.2012.11.024. Epub 2012 Dec 2.
In this paper, the discrete wavelet transform was introduced into the trinucleotide compositions and a novel DNA sequence representation technique, namely pseudo-trinucleotide compositions was proposed. The pseudo-trinucleotide compositions based on discrete wavelets transform were then employed to model support vector machines (SVM) for the prediction of promoters. The model was evaluated on the genie dataset, and the overall prediction accuracy (ACC) by jackknife validation for the classification of promoters, introns and exons was 82.46%, while the ACC for the classification of promoters and unpromoters was 82.18%, which was far better than the previous results. The satisfied prediction result revealed that the pseudo-trinucleotide composition based on discrete wavelet transform was an effective representation method for DNA sequence, and plays a very important role in the prediction of DNA function.
在本文中,我们将离散小波变换引入三核苷酸组成中,提出了一种新的 DNA 序列表示技术,即伪三核苷酸组成。然后,基于离散小波变换的伪三核苷酸组成被用于支持向量机 (SVM) 建模,以预测启动子。该模型在 genie 数据集上进行了评估,通过 Jackknife 验证对启动子、内含子和外显子的分类的总体预测准确率 (ACC) 为 82.46%,而对启动子和非启动子的分类的 ACC 为 82.18%,这明显优于以前的结果。满意的预测结果表明,基于离散小波变换的伪三核苷酸组成是一种有效的 DNA 序列表示方法,在 DNA 功能预测中起着非常重要的作用。