Suppr超能文献

从相关蛋白质序列集中自动生成一级序列模式。

Automatic generation of primary sequence patterns from sets of related protein sequences.

作者信息

Smith R F, Smith T F

机构信息

Department of Biostatistics, Dana-Farber Cancer Institute, Boston, MA 02115.

出版信息

Proc Natl Acad Sci U S A. 1990 Jan;87(1):118-22. doi: 10.1073/pnas.87.1.118.

Abstract

We have developed a computer algorithm that can extract the pattern of conserved primary sequence elements common to all members of a homologous protein family. The method involves clustering the pairwise similarity scores among a set of related sequences to generate a binary dendrogram (tree). The tree is then reduced in a stepwise manner by progressively replacing the node connecting the two most similar termini by one common pattern until only a single common "root" pattern remains. A pattern is generated at a node by (i) performing a local optimal alignment on the sequence/pattern pair connected by the node with the use of an extended dynamic programming algorithm and then (ii) constructing a single common pattern from this alignment with a nested hierarchy of amino acid classes to identify the minimal inclusive amino acid class covering each paired set of elements in the alignment. Gaps within an alignment are created and/or extended using a "pay once" gap penalty rule, and gapped positions are converted into gap characters that function as 0 or 1 amino acid of any type during subsequent alignment. This method has been used to generate a library of covering patterns for homologous families in the National Biomedical Research Foundation/Protein Identification Resource protein sequence data base. We show that a covering pattern can be more diagnostic for sequence family membership than any of the individual sequences used to construct the pattern.

摘要

我们开发了一种计算机算法,该算法能够提取同源蛋白质家族所有成员共有的保守一级序列元件模式。该方法包括对一组相关序列之间的成对相似性得分进行聚类,以生成二元树状图(树)。然后通过逐步用一个共同模式替换连接两个最相似末端的节点,以逐步简化该树,直到仅剩下一个共同的“根”模式。在一个节点处生成模式的方法如下:(i) 使用扩展动态规划算法对由该节点连接的序列/模式对进行局部最优比对,然后 (ii) 根据该比对,通过氨基酸类别的嵌套层次结构构建一个单一的共同模式,以识别覆盖比对中每个配对元素集的最小包容性氨基酸类别。比对中的空位使用“一次付费”空位罚分规则来创建和/或扩展,并且在后续比对过程中,有间隙的位置会转换为间隙字符,其作用相当于任何类型的0或1个氨基酸。该方法已用于为国家生物医学研究基金会/蛋白质鉴定资源蛋白质序列数据库中的同源家族生成覆盖模式库。我们表明,对于序列家族成员身份,一个覆盖模式可能比用于构建该模式的任何单个序列更具诊断性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/735f/53211/591d0ef0323d/pnas01026-0141-a.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验