一种用于对具有重复序列的蛋白质进行全基因组分析的快速算法。

A fast algorithm for genome-wide analysis of proteins with repeated sequences.

作者信息

Pellegrini M, Marcotte E M, Yeates T O

机构信息

Molecular Biology Institute and UCLA-DOE Laboratory of Structural Biology and Molecular Medicine, University of California, Los Angeles, 90095-1570, USA.

出版信息

Proteins. 1999 Jun 1;35(4):440-6.

PMID:10382671

Abstract

We present a fast algorithm to search for repeating fragments within protein sequences. The technique is based on an extension of the Smith-Waterman algorithm that allows the calculation of sub-optimal alignments of a sequence against itself. We are able to estimate the statistical significance of all sub-optimal alignment scores. We also rapidly determine the length of the repeating fragment and the number of times it is found in a sequence. The technique is applied to sequences in the Swissprot database, and to 16 complete genomes. We find that eukaryotic proteins contain more internal repeats than those of prokaryotic and archael organisms. The finding that 18% of yeast sequences and 28% of the known human sequences contain detectable repeats emphasizes the importance of internal duplication in protein evolution.

摘要

我们提出了一种快速算法，用于在蛋白质序列中搜索重复片段。该技术基于Smith-Waterman算法的扩展，该扩展允许计算序列与其自身的次优比对。我们能够估计所有次优比对分数的统计显著性。我们还能快速确定重复片段的长度及其在序列中出现的次数。该技术应用于Swissprot数据库中的序列以及16个完整基因组。我们发现真核生物蛋白质比原核生物和古细菌生物的蛋白质含有更多的内部重复序列。18%的酵母序列和28%的已知人类序列含有可检测到的重复序列这一发现强调了内部重复在蛋白质进化中的重要性。

相似文献

A fast algorithm for genome-wide analysis of proteins with repeated sequences.

Proteins. 1999 Jun 1;35(4):440-6.

Homology-based method for identification of protein repeats using statistical significance estimates.

J Mol Biol. 2000 May 5;298(3):521-37. doi: 10.1006/jmbi.2000.3684.

A database of phylogenetically atypical genes in archaeal and bacterial genomes, identified using the DarkHorse algorithm.

BMC Bioinformatics. 2008 Oct 7;9:419. doi: 10.1186/1471-2105-9-419.

A genome phylogeny for mitochondria among alpha-proteobacteria and a predominantly eubacterial ancestry of yeast nuclear genes.

Mol Biol Evol. 2004 Sep;21(9):1643-60. doi: 10.1093/molbev/msh160. Epub 2004 May 21.

Clustering of database sequences for fast homology search using upper bounds on alignment score.

Genome Inform. 2004;15(1):93-104.

De novo identification of highly diverged protein repeats by probabilistic consistency.

Bioinformatics. 2008 Mar 15;24(6):807-14. doi: 10.1093/bioinformatics/btn039. Epub 2008 Feb 1.

A census of protein repeats.

J Mol Biol. 1999 Oct 15;293(1):151-60. doi: 10.1006/jmbi.1999.3136.

The complete genome sequence of the hyperthermophilic, sulphate-reducing archaeon Archaeoglobus fulgidus.

Nature. 1997 Nov 27;390(6658):364-70. doi: 10.1038/37052.

Fast model-based protein homology detection without alignment.

Bioinformatics. 2007 Jul 15;23(14):1728-36. doi: 10.1093/bioinformatics/btm247. Epub 2007 May 8.

Protein family classification based on searching a database of blocks.

Genomics. 1994 Jan 1;19(1):97-107. doi: 10.1006/geno.1994.1018.

引用本文的文献

TReSR: A PCR-compatible DNA sequence design method for engineering proteins containing tandem repeats.

PLoS One. 2023 Apr 12;18(4):e0281228. doi: 10.1371/journal.pone.0281228. eCollection 2023.

Lineage-specific protein repeat expansions and contractions reveal malleable regions of immune genes.

Genes Immun. 2022 Nov;23(7):218-234. doi: 10.1038/s41435-022-00186-4. Epub 2022 Oct 6.

Unraveling the Mechanics of a Repeat-Protein Nanospring: From Folding of Individual Repeats to Fluctuations of the Superhelix.

ACS Nano. 2022 Mar 22;16(3):3895-3905. doi: 10.1021/acsnano.1c09162. Epub 2022 Mar 8.

Testing the length limit of loop grafting in a helical repeat protein.

Curr Res Struct Biol. 2020 Dec 8;3:30-40. doi: 10.1016/j.crstbi.2020.12.002. eCollection 2021.

Self-analysis of repeat proteins reveals evolutionarily conserved patterns.

BMC Bioinformatics. 2020 May 7;21(1):179. doi: 10.1186/s12859-020-3493-y.

Photo-cleavable purification/protection handle assisted synthesis of giant modified proteins with tandem repeats.

Chem Sci. 2019 Aug 12;10(37):8694-8700. doi: 10.1039/c9sc03693h. eCollection 2019 Oct 7.

Identification and Analysis of Long Repeats of Proteins at the Domain Level.

Front Bioeng Biotechnol. 2019 Oct 8;7:250. doi: 10.3389/fbioe.2019.00250. eCollection 2019.

Quantifying Single mRNA Translation Kinetics in Living Cells.

Cold Spring Harb Perspect Biol. 2018 Nov 1;10(11):a032078. doi: 10.1101/cshperspect.a032078.

Folding cooperativity and allosteric function in the tandem-repeat protein class.

Philos Trans R Soc Lond B Biol Sci. 2018 Jun 19;373(1749). doi: 10.1098/rstb.2017.0188.

Genomics of Clostridium taeniosporum, an organism which forms endospores with ribbon-like appendages.

PLoS One. 2018 Jan 2;13(1):e0189673. doi: 10.1371/journal.pone.0189673. eCollection 2018.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

一种用于对具有重复序列的蛋白质进行全基因组分析的快速算法。

A fast algorithm for genome-wide analysis of proteins with repeated sequences.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献