Suppr超能文献

利用中程 DNA 模式进行序列分类:二进制抽象马尔可夫模型。

Exploiting mid-range DNA patterns for sequence classification: binary abstraction Markov models.

机构信息

Department of Medicine, University of Toledo, Health Science Campus, Toledo, OH 43614, USA.

出版信息

Nucleic Acids Res. 2012 Jun;40(11):4765-73. doi: 10.1093/nar/gks154. Epub 2012 Feb 16.

Abstract

Messenger RNA sequences possess specific nucleotide patterns distinguishing them from non-coding genomic sequences. In this study, we explore the utilization of modified Markov models to analyze sequences up to 44 bp, far beyond the 8-bp limit of conventional Markov models, for exon/intron discrimination. In order to analyze nucleotide sequences of this length, their information content is first reduced by conversion into shorter binary patterns via the application of numerous abstraction schemes. After the conversion of genomic sequences to binary strings, homogenous Markov models trained on the binary sequences are used to discriminate between exons and introns. We term this approach the Binary Abstraction Markov Model (BAMM). High-quality abstraction schemes for exon/intron discrimination are selected using optimization algorithms on supercomputers. The best MM classifiers are then combined using support vector machines into a single classifier. With this approach, over 95% classification accuracy is achieved without taking reading frame into account. With further development, the BAMM approach can be applied to sequences lacking the genetic code such as ncRNAs and 5'-untranslated regions.

摘要

信使 RNA 序列具有特定的核苷酸模式,将其与非编码基因组序列区分开来。在这项研究中,我们探索了使用改进的马尔可夫模型来分析长达 44 个碱基的序列,远远超过传统马尔可夫模型 8 个碱基的限制,以进行外显子/内含子区分。为了分析这种长度的核苷酸序列,首先通过应用大量抽象方案将其信息内容转换为更短的二进制模式进行转换。将基因组序列转换为二进制字符串后,使用在二进制序列上训练的同质马尔可夫模型来区分外显子和内含子。我们将这种方法称为二进制抽象马尔可夫模型(BAMM)。使用超级计算机上的优化算法选择用于外显子/内含子区分的高质量抽象方案。然后,使用支持向量机将最佳 MM 分类器组合成单个分类器。通过这种方法,在不考虑阅读框的情况下,分类准确率超过 95%。随着进一步的发展,BAMM 方法可以应用于缺乏遗传密码的序列,如 ncRNAs 和 5'-非翻译区。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a667/3367190/3ed9ad55fde3/gks154f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验