Suppr超能文献

从蛋白质序列中快速生成突变数据矩阵。

The rapid generation of mutation data matrices from protein sequences.

作者信息

Jones D T, Taylor W R, Thornton J M

机构信息

Department of Biochemistry and Molecular Biology, University College, London, UK.

出版信息

Comput Appl Biosci. 1992 Jun;8(3):275-82. doi: 10.1093/bioinformatics/8.3.275.

Abstract

An efficient means for generating mutation data matrices from large numbers of protein sequences is presented here. By means of an approximate peptide-based sequence comparison algorithm, the set sequences are clustered at the 85% identity level. The closest relating pairs of sequences are aligned, and observed amino acid exchanges tallied in a matrix. The raw mutation frequency matrix is processed in a similar way to that described by Dayhoff et al. (1978), and so the resulting matrices may be easily used in current sequence analysis applications, in place of the standard mutation data matrices, which have not been updated for 13 years. The method is fast enough to process the entire SWISS-PROT databank in 20 h on a Sun SPARCstation 1, and is fast enough to generate a matrix from a specific family or class of proteins in minutes. Differences observed between our 250 PAM mutation data matrix and the matrix calculated by Dayhoff et al. are briefly discussed.

摘要

本文介绍了一种从大量蛋白质序列生成突变数据矩阵的有效方法。借助基于近似肽的序列比较算法,将序列集在85%同一性水平上进行聚类。对最相近的序列对进行比对,并在矩阵中统计观察到的氨基酸交换情况。原始突变频率矩阵的处理方式与Dayhoff等人(1978年)描述的类似,因此所得矩阵可轻松用于当前的序列分析应用中,以替代已13年未更新的标准突变数据矩阵。该方法速度足够快,在Sun SPARCstation 1上20小时内可处理整个SWISS-PROT数据库,并且在几分钟内就能从特定的蛋白质家族或类别中生成矩阵。我们简要讨论了250 PAM突变数据矩阵与Dayhoff等人计算的矩阵之间观察到的差异。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验