探索蛋白质结构差异以促进结构分类。

Exploring protein structural dissimilarity to facilitate structure classification.

作者信息

Jain Pooja, Hirst Jonathan D

机构信息

School of Chemistry, The University of Nottingham, University Park, Nottingham NG7 2RD, UK.

出版信息

BMC Struct Biol. 2009 Sep 19;9:60. doi: 10.1186/1472-6807-9-60.

DOI:10.1186/1472-6807-9-60

PMID:19765314

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2754988/

Abstract

BACKGROUND

Classification of newly resolved protein structures is important in understanding their architectural, evolutionary and functional relatedness to known protein structures. Among various efforts to improve the database of Structural Classification of Proteins (SCOP), automation has received particular attention. Herein, we predict the deepest SCOP structural level that an unclassified protein shares with classified proteins with an equal number of secondary structure elements (SSEs).

RESULTS

We compute a coefficient of dissimilarity (Omega) between proteins, based on structural and sequence-based descriptors characterising the respective constituent SSEs. For a set of 1,661 pairs of proteins with sequence identity up to 35%, the performance of Omega in predicting shared Class, Fold and Super-family levels is comparable to that of DaliLite Z score and shows a greater than four-fold increase in the true positive rate (TPR) for proteins sharing the Family level. On a larger set of 600 domains representing 200 families, the performance of Z score improves in predicting a shared Family, but still only achieves about half of the TPR of Omega. The TPR for structures sharing a Super-family is lower than in the first dataset, but Omega performs slightly better than Z score. Overall, the sensitivity of Omega in predicting common Fold level is higher than that of the DaliLite Z score.

CONCLUSION

Classification to a deeper level in the hierarchy is specific and difficult. So the efficiency of Omega may be attractive to the curators and the end-users of SCOP. We suggest Omega may be a better measure for structure classification than the DaliLite Z score, with the caveat that currently we are restricted to comparing structures with equal number of SSEs.

摘要

背景

新解析出的蛋白质结构分类对于理解其与已知蛋白质结构在架构、进化和功能上的相关性至关重要。在各种改进蛋白质结构分类数据库（SCOP）的努力中，自动化受到了特别关注。在此，我们预测一个未分类蛋白质与具有相同数量二级结构元件（SSE）的已分类蛋白质所共有的最深SCOP结构层次。

结果

我们基于表征各个组成SSE的结构和序列描述符计算蛋白质之间的差异系数（Omega）。对于一组序列同一性高达35%的1661对蛋白质，Omega在预测共享的类、折叠和超家族层次方面的性能与DaliLite Z分数相当，并且对于共享家族层次的蛋白质，真阳性率（TPR）提高了四倍以上。在代表200个家族的600个结构域的更大集合上，Z分数在预测共享家族方面的性能有所提高，但仍仅达到Omega的TPR的约一半。共享超家族的结构的TPR低于第一个数据集，但Omega的表现略优于Z分数。总体而言，Omega在预测共同折叠层次方面的敏感性高于DaliLite Z分数。

结论

在层次结构中进行更深入的分类既具体又困难。因此，Omega的效率可能对SCOP的策展人和最终用户具有吸引力。我们建议，Omega可能是比DaliLite Z分数更好的结构分类度量标准，但需要注意的是，目前我们仅限于比较具有相同数量SSE的结构。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5e5b/2754988/27931208e807/1472-6807-9-60-1.jpg

相似文献

Exploring protein structural dissimilarity to facilitate structure classification.

BMC Struct Biol. 2009 Sep 19;9:60. doi: 10.1186/1472-6807-9-60.

Automatic structure classification of small proteins using random forest.

BMC Bioinformatics. 2010 Jul 1;11:364. doi: 10.1186/1471-2105-11-364.

Automatic classification of protein structures using low-dimensional structure space mappings.

BMC Bioinformatics. 2014;15 Suppl 2(Suppl 2):S1. doi: 10.1186/1471-2105-15-S2-S1. Epub 2014 Jan 24.

Supervised machine learning algorithms for protein structure classification.

Comput Biol Chem. 2009 Jun;33(3):216-23. doi: 10.1016/j.compbiolchem.2009.04.004. Epub 2009 May 3.

SCOP database in 2002: refinements accommodate structural genomics.

Nucleic Acids Res. 2002 Jan 1;30(1):264-7. doi: 10.1093/nar/30.1.264.

AutoSCOP: automated prediction of SCOP classifications using unique pattern-class mappings.

Bioinformatics. 2007 May 15;23(10):1203-10. doi: 10.1093/bioinformatics/btm089. Epub 2007 Mar 22.

SCOPe: Manual Curation and Artifact Removal in the Structural Classification of Proteins - extended Database.

J Mol Biol. 2017 Feb 3;429(3):348-355. doi: 10.1016/j.jmb.2016.11.023. Epub 2016 Nov 30.

SCOP database in 2004: refinements integrate structure and sequence family data.

Nucleic Acids Res. 2004 Jan 1;32(Database issue):D226-9. doi: 10.1093/nar/gkh039.

Data growth and its impact on the SCOP database: new developments.

Nucleic Acids Res. 2008 Jan;36(Database issue):D419-25. doi: 10.1093/nar/gkm993. Epub 2007 Nov 13.

Towards an automatic classification of protein structural domains based on structural similarity.

BMC Bioinformatics. 2008 Jan 31;9:74. doi: 10.1186/1471-2105-9-74.

引用本文的文献

Discovery of broad-spectrum antivirals targeting viral proteases using in silico structural modeling and cellular analysis.

Antiviral Res. 2025 Sep;241:106245. doi: 10.1016/j.antiviral.2025.106245. Epub 2025 Jul 29.

Homology-based identification and structural analysis of Annexins and Serine proteases to search molecules for wound healing applications.

Comput Struct Biotechnol J. 2024 Oct 11;23:3680-3691. doi: 10.1016/j.csbj.2024.10.015. eCollection 2024 Dec.

本文引用的文献

Exploiting structural classifications for function prediction: towards a domain grammar for protein function.

Curr Opin Struct Biol. 2009 Jun;19(3):349-56. doi: 10.1016/j.sbi.2009.03.009. Epub 2009 Apr 22.

Searching protein structure databases with DaliLite v.3.

Bioinformatics. 2008 Dec 1;24(23):2780-1. doi: 10.1093/bioinformatics/btn507. Epub 2008 Sep 25.

Structural biology. Protein structure initiative: phase 3 or phase out.

Science. 2008 Mar 21;319(5870):1610-3. doi: 10.1126/science.319.5870.1610.

Prediction of protein structural class using novel evolutionary collocation-based sequence representation.

J Comput Chem. 2008 Jul 30;29(10):1596-604. doi: 10.1002/jcc.20918.

Data growth and its impact on the SCOP database: new developments.

Nucleic Acids Res. 2008 Jan;36(Database issue):D419-25. doi: 10.1093/nar/gkm993. Epub 2007 Nov 13.

PFRES: protein fold classification by using evolutionary information and predicted secondary structure.

Bioinformatics. 2007 Nov 1;23(21):2843-50. doi: 10.1093/bioinformatics/btm475. Epub 2007 Oct 17.

Novel leverage of structural genomics.

Nat Biotechnol. 2007 Aug;25(8):849-51. doi: 10.1038/nbt0807-849.

AutoSCOP: automated prediction of SCOP classifications using unique pattern-class mappings.

Bioinformatics. 2007 May 15;23(10):1203-10. doi: 10.1093/bioinformatics/btm089. Epub 2007 Mar 22.

Emergence of protein fold families through rational design.

PLoS Comput Biol. 2006 Jul 7;2(7):e85. doi: 10.1371/journal.pcbi.0020085. Epub 2006 May 26.

Secondary structure determines protein topology.

Protein Sci. 2006 Aug;15(8):1829-34. doi: 10.1110/ps.062305106. Epub 2006 Jul 5.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

探索蛋白质结构差异以促进结构分类。

Exploring protein structural dissimilarity to facilitate structure classification.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献