Suppr超能文献

用于序列比较和结构比较的统一统计框架。

A unified statistical framework for sequence comparison and structure comparison.

作者信息

Levitt M, Gerstein M

机构信息

Department of Structural Biology, Stanford University, Stanford, CA 94305, USA.

出版信息

Proc Natl Acad Sci U S A. 1998 May 26;95(11):5913-20. doi: 10.1073/pnas.95.11.5913.

Abstract

We present an approach for assessing the significance of sequence and structure comparisons by using nearly identical statistical formalisms for both sequence and structure. Doing so involves an all-vs.-all comparison of protein domains [taken here from the Structural Classification of Proteins (scop) database] and then fitting a simple distribution function to the observed scores. By using this distribution, we can attach a statistical significance to each comparison score in the form of a P value, the probability that a better score would occur by chance. As expected, we find that the scores for sequence matching follow an extreme-value distribution. The agreement, moreover, between the P values that we derive from this distribution and those reported by standard programs (e.g., BLAST and FASTA validates our approach. Structure comparison scores also follow an extreme-value distribution when the statistics are expressed in terms of a structural alignment score (essentially the sum of reciprocated distances between aligned atoms minus gap penalties). We find that the traditional metric of structural similarity, the rms deviation in atom positions after fitting aligned atoms, follows a different distribution of scores and does not perform as well as the structural alignment score. Comparison of the sequence and structure statistics for pairs of proteins known to be related distantly shows that structural comparison is able to detect approximately twice as many distant relationships as sequence comparison at the same error rate. The comparison also indicates that there are very few pairs with significant similarity in terms of sequence but not structure whereas many pairs have significant similarity in terms of structure but not sequence.

摘要

我们提出了一种方法,通过对序列和结构使用几乎相同的统计形式来评估序列和结构比较的显著性。这样做涉及对蛋白质结构域进行全对全比较(此处取自蛋白质结构分类数据库),然后将一个简单的分布函数拟合到观察到的分数上。通过使用这种分布,我们可以以P值的形式为每个比较分数赋予统计显著性,即偶然获得更好分数的概率。正如预期的那样,我们发现序列匹配的分数遵循极值分布。此外,我们从该分布得出的P值与标准程序(如BLAST和FASTA)报告的P值之间的一致性验证了我们的方法。当统计以结构比对分数表示时(本质上是比对原子之间的倒数距离之和减去空位罚分),结构比较分数也遵循极值分布。我们发现,传统的结构相似性度量,即拟合比对原子后原子位置的均方根偏差,遵循不同的分数分布,并且不如结构比对分数表现好。对已知远缘相关的蛋白质对的序列和结构统计进行比较表明,在相同错误率下,结构比较能够检测到的远缘关系数量大约是序列比较的两倍。该比较还表明,在序列方面有显著相似性但在结构方面没有的蛋白质对非常少,而在结构方面有显著相似性但在序列方面没有的蛋白质对有很多。

相似文献

3
Effective protein sequence comparison.有效的蛋白质序列比较。
Methods Enzymol. 1996;266:227-58. doi: 10.1016/s0076-6879(96)66017-0.
7
Making sense of score statistics for sequence alignments.理解序列比对的得分统计。
Brief Bioinform. 2001 Mar;2(1):51-67. doi: 10.1093/bib/2.1.51.
10

引用本文的文献

5
Estimating the Similarity between Protein Pockets.估算蛋白质口袋之间的相似性。
Int J Mol Sci. 2022 Oct 18;23(20):12462. doi: 10.3390/ijms232012462.

本文引用的文献

5
Identifying distantly related protein sequences.识别远缘相关的蛋白质序列。
Comput Appl Biosci. 1997 Aug;13(4):325-32. doi: 10.1093/bioinformatics/13.4.325.
7
SCOP: a structural classification of proteins database.SCOP:蛋白质数据库的结构分类
Nucleic Acids Res. 1997 Jan 1;25(1):236-9. doi: 10.1093/nar/25.1.236.
9
Surprising similarities in structure comparison.结构比较中惊人的相似之处。
Curr Opin Struct Biol. 1996 Jun;6(3):377-85. doi: 10.1016/s0959-440x(96)80058-3.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验