Suppr超能文献

蛋白质结构相似度 TM 值为 0.5 有多大意义?

How significant is a protein structure similarity with TM-score = 0.5?

机构信息

Department of Medical School, Center for Computational Medicine and Bioinformatics, University of Michigan, 100 Washtenaw Avenue, Ann Arbor, MI 48109, USA.

出版信息

Bioinformatics. 2010 Apr 1;26(7):889-95. doi: 10.1093/bioinformatics/btq066. Epub 2010 Feb 17.

Abstract

MOTIVATION

Protein structure similarity is often measured by root mean squared deviation, global distance test score and template modeling score (TM-score). However, the scores themselves cannot provide information on how significant the structural similarity is. Also, it lacks a quantitative relation between the scores and conventional fold classifications. This article aims to answer two questions: (i) what is the statistical significance of TM-score? (ii) What is the probability of two proteins having the same fold given a specific TM-score?

RESULTS

We first made an all-to-all gapless structural match on 6684 non-homologous single-domain proteins in the PDB and found that the TM-scores follow an extreme value distribution. The data allow us to assign each TM-score a P-value that measures the chance of two randomly selected proteins obtaining an equal or higher TM-score. With a TM-score at 0.5, for instance, its P-value is 5.5 x 10(-7), which means we need to consider at least 1.8 million random protein pairs to acquire a TM-score of no less than 0.5. Second, we examine the posterior probability of the same fold proteins from three datasets SCOP, CATH and the consensus of SCOP and CATH. It is found that the posterior probability from different datasets has a similar rapid phase transition around TM-score=0.5. This finding indicates that TM-score can be used as an approximate but quantitative criterion for protein topology classification, i.e. protein pairs with a TM-score >0.5 are mostly in the same fold while those with a TM-score <0.5 are mainly not in the same fold.

摘要

动机

蛋白质结构相似性通常通过均方根偏差、全局距离测试得分和模板建模得分(TM 得分)来衡量。然而,这些分数本身并不能提供结构相似性的显著程度的信息。此外,它缺乏分数与常规折叠分类之间的定量关系。本文旨在回答两个问题:(i)TM 得分的统计显著性如何?(ii)给定特定的 TM 得分,两个蛋白质具有相同折叠的概率是多少?

结果

我们首先在 PDB 中对 6684 个非同源单域蛋白进行了全对全无间隙结构匹配,发现 TM 得分遵循极值分布。该数据允许我们为每个 TM 得分分配一个 P 值,该 P 值衡量随机选择的两个蛋白质获得相等或更高 TM 得分的机会。例如,TM 得分为 0.5 时,其 P 值为 5.5×10(-7),这意味着我们需要考虑至少 180 万个随机蛋白质对才能获得不低于 0.5 的 TM 得分。其次,我们检查了来自 SCOP、CATH 和 SCOP 和 CATH 共识的三个数据集的相同折叠蛋白质的后验概率。发现不同数据集的后验概率在 TM 得分=0.5 附近具有相似的快速相变。这一发现表明,TM 得分可以用作蛋白质拓扑分类的近似但定量标准,即 TM 得分>0.5 的蛋白质对主要处于相同折叠,而 TM 得分<0.5 的蛋白质对主要不处于相同折叠。

相似文献

1
How significant is a protein structure similarity with TM-score = 0.5?蛋白质结构相似度 TM 值为 0.5 有多大意义?
Bioinformatics. 2010 Apr 1;26(7):889-95. doi: 10.1093/bioinformatics/btq066. Epub 2010 Feb 17.
10
Accelerated protein structure comparison using TM-score-GPU.使用 TM-score-GPU 加速蛋白质结构比较。
Bioinformatics. 2012 Aug 15;28(16):2191-2. doi: 10.1093/bioinformatics/bts345. Epub 2012 Jun 19.

引用本文的文献

5

本文引用的文献

3
Discrete-continuous duality of protein structure space.蛋白质结构空间的离散-连续对偶性。
Curr Opin Struct Biol. 2009 Jun;19(3):321-8. doi: 10.1016/j.sbi.2009.04.009. Epub 2009 May 29.
4
Protein structure prediction: when is it useful?蛋白质结构预测:何时有用?
Curr Opin Struct Biol. 2009 Apr;19(2):145-55. doi: 10.1016/j.sbi.2009.02.005. Epub 2009 Mar 25.
6
The ModFOLD server for the quality assessment of protein structural models.用于蛋白质结构模型质量评估的ModFOLD服务器。
Bioinformatics. 2008 Feb 15;24(4):586-7. doi: 10.1093/bioinformatics/btn014. Epub 2008 Jan 9.
7
Data growth and its impact on the SCOP database: new developments.数据增长及其对SCOP数据库的影响:新进展
Nucleic Acids Res. 2008 Jan;36(Database issue):D419-25. doi: 10.1093/nar/gkm993. Epub 2007 Nov 13.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验