Suppr超能文献

编码和非编码DNA序列分类的通用特征

Universal Features for the Classification of Coding and Non-coding DNA Sequences.

作者信息

Carels Nicolas, Vidal Ramon, Frías Diego

机构信息

Fundação Oswaldo Cruz (FIOCRUZ), Instituto Oswaldo Cruz (IOC), Laboratório de Genômica Funcional e Bioinformática, Rio de Janeiro, RJ, Brazil.

出版信息

Bioinform Biol Insights. 2009 Jun 3;3:37-49. doi: 10.4137/bbi.s2236.

Abstract

In this report, we revisited simple features that allow the classification of coding sequences (CDS) from non-coding DNA. The spectrum of codon usage of our sequence sample is large and suggests that these features are universal. The features that we investigated combine (i) the stop codon distribution, (ii) the product of purine probabilities in the three positions of nucleotide triplets, (iii) the product of Cytosine, Guanine, Adenine probabilities in 1st, 2nd, 3rd position of triplets, respectively, (iv) the product of G and C probabilities in 1st and 2nd position of triplets. These features are a natural consequence of the physico-chemical properties of proteins and their combination is successful in classifying CDS and non-coding DNA (introns) with a success rate >95% above 350 bp. The coding strand and coding frame are implicitly deduced when the sequences are classified as coding.

摘要

在本报告中,我们重新审视了一些简单特征,这些特征可用于从非编码DNA中分类编码序列(CDS)。我们序列样本的密码子使用谱范围广泛,表明这些特征具有普遍性。我们研究的特征包括:(i)终止密码子分布;(ii)核苷酸三联体三个位置上嘌呤概率的乘积;(iii)三联体第一、第二、第三位上胞嘧啶、鸟嘌呤、腺嘌呤概率的乘积;(iv)三联体第一和第二位上G和C概率的乘积。这些特征是蛋白质物理化学性质的自然结果,它们的组合成功地对CDS和非编码DNA(内含子)进行了分类,对于长度超过350 bp的序列,成功率>95%。当序列被分类为编码序列时,编码链和编码框会被隐含推导出来。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/febb/2808180/d0e5a93e67c9/bbi-2009-037f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验