Primary sequences of proteins from complete genomes display a singular periodicity: Alignment-free N-gram analysis.

作者信息

Radomski Jan P, Slonimski Piotr P

机构信息

Interdisciplinary Centre for Mathematical and Computational Modelling, Warsaw University, Pawińskiego 5A, Bldg. D, 02106 Warsaw, Poland.

出版信息

C R Biol. 2007 Jan;330(1):33-48. doi: 10.1016/j.crvi.2006.11.001. Epub 2006 Dec 1.

DOI:10.1016/j.crvi.2006.11.001

PMID:17241946

Abstract

A method is proposed to represent and to analyze complete genome sequences (52 species from procaryotes and eukaryotes), based upon n-gram sequence's frequencies of amino acid pairs (bigrams), separated by a given number of other residues. For each of the species analyzed, it allows us to construct over-abundant and over-deficient occurrence profiles, summarizing amino acid bigram frequencies over the entire genome. The method deals efficiently with a sparseness of statistical representations of individual sequences, and describes every gene sequence in the same way, independently of its length and of the genome sizes. The frequency of over-abundant and over-deficient occurrences of bigrams presents a singular periodicity around 3.5 peptide bonds, suggesting a relation with the alpha helical secondary structure.

摘要

相似文献

Primary sequences of proteins from complete genomes display a singular periodicity: Alignment-free N-gram analysis.

C R Biol. 2007 Jan;330(1):33-48. doi: 10.1016/j.crvi.2006.11.001. Epub 2006 Dec 1.

The origins of modern proteomes.

Biochimie. 2007 Dec;89(12):1454-63. doi: 10.1016/j.biochi.2007.09.004. Epub 2007 Sep 15.

Periodic oscillations of the genomic nucleotide sequences disclose major differences in the way of constructing homologous proteins from different procaryotic species.

C R Biol. 2007 Jan;330(1):13-32. doi: 10.1016/j.crvi.2006.07.002. Epub 2006 Oct 3.

Amino acid coupling patterns in thermophilic proteins.

Proteins. 2005 Apr 1;59(1):58-63. doi: 10.1002/prot.20386.

Evolution of prokaryotic subtilases: genome-wide analysis reveals novel subfamilies with different catalytic residues.

Proteins. 2007 May 15;67(3):681-94. doi: 10.1002/prot.21290.

Identification and analysis of a new family of bacterial serine proteinases.

In Silico Biol. 2004;4(4):563-72.

The stability of thermophilic proteins: a study based on comprehensive genome comparison.

Funct Integr Genomics. 2000 May;1(1):76-88. doi: 10.1007/s101420000003.

Indicators from archaeal secretomes.

Microbiol Res. 2010;165(1):1-10. doi: 10.1016/j.micres.2008.03.002. Epub 2008 Apr 14.

Environment specific substitution tables for thermophilic proteins.

BMC Bioinformatics. 2007 Mar 8;8 Suppl 1(Suppl 1):S15. doi: 10.1186/1471-2105-8-S1-S15.

Analysis of invariant sequences in 266 complete genomes.

Gene. 2007 Oct 15;401(1-2):172-80. doi: 10.1016/j.gene.2007.07.017. Epub 2007 Aug 1.

引用本文的文献

Word decoding of protein amino Acid sequences with availability analysis: a linguistic approach.

PLoS One. 2012;7(11):e50039. doi: 10.1371/journal.pone.0050039. Epub 2012 Nov 21.

n-Gram characterization of genomic islands in bacterial genomes.

Comput Methods Programs Biomed. 2009 Mar;93(3):241-56. doi: 10.1016/j.cmpb.2008.10.014. Epub 2008 Dec 19.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验