Suppr超能文献

一种使用支持向量机进行剪接位点预测的编码方法。

An approach of encoding for prediction of splice sites using SVM.

作者信息

Huang J, Li T, Chen K, Wu J

机构信息

Department of Chemistry, Tongji University, Shanghai, China.

出版信息

Biochimie. 2006 Jul;88(7):923-9. doi: 10.1016/j.biochi.2006.03.006. Epub 2006 Apr 3.

Abstract

In splice sites prediction, the accuracy is lower than 90% though the sequences adjacent to the splice sites have a high conservation. In order to improve the prediction accuracy, much attention has been paid to the improvement of the performance of the algorithms used, and few used for solving the fundamental issues, namely, nucleotide encoding. In this paper, a predictor is constructed to predict the true and false splice sites for higher eukaryotes based on support vector machines (SVM). Four types of encoding, which were mono-nucleotide (MN) encoding, MN with frequency difference between the true sites and false sites (FDTF) encoding, Pair-wise nucleotides (PN) encoding and PN with FDTF encoding, were applied to generate the input for the SVM. The results showed that PN with FDTF encoding as input to SVM led to the most reliable recognition of splice sites and the accuracy for the prediction of true donor sites and false sites were 96.3%, 93.7%, respectively, and the accuracy for predicting of true acceptor sites and false sites were 94.0%, 93.2%, respectively.

摘要

在剪接位点预测中,尽管剪接位点附近的序列具有高度保守性,但准确率低于90%。为了提高预测准确率,人们已将大量注意力放在所使用算法性能的提升上,而很少关注用于解决根本问题,即核苷酸编码。本文构建了一个基于支持向量机(SVM)的预测器,用于预测高等真核生物的真假剪接位点。应用了四种编码方式来生成支持向量机的输入,分别是单核苷酸(MN)编码、具有真假位点频率差异的MN(FDTF)编码、双核苷酸(PN)编码以及具有FDTF的PN编码。结果表明,以具有FDTF的PN编码作为支持向量机的输入能最可靠地识别剪接位点,预测真供体位点和假位点的准确率分别为96.3%、93.7%,预测真受体位点和假位点的准确率分别为94.0%、93.2%。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验