Suppr超能文献

一种用于对具有重复序列的蛋白质进行全基因组分析的快速算法。

A fast algorithm for genome-wide analysis of proteins with repeated sequences.

作者信息

Pellegrini M, Marcotte E M, Yeates T O

机构信息

Molecular Biology Institute and UCLA-DOE Laboratory of Structural Biology and Molecular Medicine, University of California, Los Angeles, 90095-1570, USA.

出版信息

Proteins. 1999 Jun 1;35(4):440-6.

Abstract

We present a fast algorithm to search for repeating fragments within protein sequences. The technique is based on an extension of the Smith-Waterman algorithm that allows the calculation of sub-optimal alignments of a sequence against itself. We are able to estimate the statistical significance of all sub-optimal alignment scores. We also rapidly determine the length of the repeating fragment and the number of times it is found in a sequence. The technique is applied to sequences in the Swissprot database, and to 16 complete genomes. We find that eukaryotic proteins contain more internal repeats than those of prokaryotic and archael organisms. The finding that 18% of yeast sequences and 28% of the known human sequences contain detectable repeats emphasizes the importance of internal duplication in protein evolution.

摘要

我们提出了一种快速算法,用于在蛋白质序列中搜索重复片段。该技术基于Smith-Waterman算法的扩展,该扩展允许计算序列与其自身的次优比对。我们能够估计所有次优比对分数的统计显著性。我们还能快速确定重复片段的长度及其在序列中出现的次数。该技术应用于Swissprot数据库中的序列以及16个完整基因组。我们发现真核生物蛋白质比原核生物和古细菌生物的蛋白质含有更多的内部重复序列。18%的酵母序列和28%的已知人类序列含有可检测到的重复序列这一发现强调了内部重复在蛋白质进化中的重要性。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验