Suppr超能文献

使用概率参数和支持向量机分类进行剪接位点识别。

Splice site identification using probabilistic parameters and SVM classification.

作者信息

Baten A K M A, Chang B C H, Halgamuge S K, Li Jason

机构信息

Dynamic Systems and Control Research Group, DoMME, The University of Melbourne, Victoria 3010, Australia.

出版信息

BMC Bioinformatics. 2006 Dec 18;7 Suppl 5(Suppl 5):S15. doi: 10.1186/1471-2105-7-S5-S15.

Abstract

BACKGROUND

Recent advances and automation in DNA sequencing technology has created a vast amount of DNA sequence data. This increasing growth of sequence data demands better and efficient analysis methods. Identifying genes in this newly accumulated data is an important issue in bioinformatics, and it requires the prediction of the complete gene structure. Accurate identification of splice sites in DNA sequences plays one of the central roles of gene structural prediction in eukaryotes. Effective detection of splice sites requires the knowledge of characteristics, dependencies, and relationship of nucleotides in the splice site surrounding region. A higher-order Markov model is generally regarded as a useful technique for modeling higher-order dependencies. However, their implementation requires estimating a large number of parameters, which is computationally expensive.

RESULTS

The proposed method for splice site detection consists of two stages: a first order Markov model (MM1) is used in the first stage and a support vector machine (SVM) with polynomial kernel is used in the second stage. The MM1 serves as a pre-processing step for the SVM and takes DNA sequences as its input. It models the compositional features and dependencies of nucleotides in terms of probabilistic parameters around splice site regions. The probabilistic parameters are then fed into the SVM, which combines them nonlinearly to predict splice sites. When the proposed MM1-SVM model is compared with other existing standard splice site detection methods, it shows a superior performance in all the cases.

CONCLUSION

We proposed an effective pre-processing scheme for the SVM and applied it for the identification of splice sites. This is a simple yet effective splice site detection method, which shows a better classification accuracy and computational speed than some other more complex methods.

摘要

背景

DNA测序技术的最新进展和自动化产生了大量的DNA序列数据。序列数据的不断增长需要更好且高效的分析方法。在这些新积累的数据中识别基因是生物信息学中的一个重要问题,这需要预测完整的基因结构。准确识别DNA序列中的剪接位点是真核生物基因结构预测的核心任务之一。有效检测剪接位点需要了解剪接位点周围区域核苷酸的特征、依赖性和关系。高阶马尔可夫模型通常被认为是一种用于对高阶依赖性进行建模的有用技术。然而,其实现需要估计大量参数,计算成本很高。

结果

所提出的剪接位点检测方法包括两个阶段:第一阶段使用一阶马尔可夫模型(MM1),第二阶段使用具有多项式核的支持向量机(SVM)。MM1作为SVM的预处理步骤,以DNA序列作为输入。它根据剪接位点区域周围的概率参数对核苷酸的组成特征和依赖性进行建模。然后将概率参数输入到SVM中,SVM将它们进行非线性组合以预测剪接位点。当将所提出的MM1-SVM模型与其他现有的标准剪接位点检测方法进行比较时,在所有情况下它都表现出优越的性能。

结论

我们为支持向量机提出了一种有效的预处理方案,并将其应用于剪接位点的识别。这是一种简单而有效的剪接位点检测方法,与其他一些更复杂的方法相比,它具有更高 的分类准确率和计算速度。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f97/1764471/d86fc5876d9e/1471-2105-7-S5-S15-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验