利用预测的二级结构从氨基酸序列中快速进行蛋白质结构域分配。

Rapid protein domain assignment from amino acid sequence using predicted secondary structure.

作者信息

Marsden Russell L, McGuffin Liam J, Jones David T

机构信息

Bioinformatics Unit, Department of Computer Science, University College London, UK.

出版信息

Protein Sci. 2002 Dec;11(12):2814-24. doi: 10.1110/ps.0209902.

DOI:10.1110/ps.0209902

PMID:12441380

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2373756/

Abstract

The elucidation of the domain content of a given protein sequence in the absence of determined structure or significant sequence homology to known domains is an important problem in structural biology. Here we address how successfully the delineation of continuous domains can be accomplished in the absence of sequence homology using simple baseline methods, an existing prediction algorithm (Domain Guess by Size), and a newly developed method (DomSSEA). The study was undertaken with a view to measuring the usefulness of these prediction methods in terms of their application to fully automatic domain assignment. Thus, the sensitivity of each domain assignment method was measured by calculating the number of correctly assigned top scoring predictions. We have implemented a new continuous domain identification method using the alignment of predicted secondary structures of target sequences against observed secondary structures of chains with known domain boundaries as assigned by Class Architecture Topology Homology (CATH). Taking top predictions only, the success rate of the method in correctly assigning domain number to the representative chain set is 73.3%. The top prediction for domain number and location of domain boundaries was correct for 24% of the multidomain set (+/-20 residues). These results have been put into context in relation to the results obtained from the other prediction methods assessed.

摘要

在缺乏确定的结构或与已知结构域显著的序列同源性的情况下，阐明给定蛋白质序列的结构域内容是结构生物学中的一个重要问题。在此，我们探讨了在缺乏序列同源性的情况下，使用简单的基线方法、一种现有的预测算法（按大小猜测结构域）和一种新开发的方法（DomSSEA），能够在多大程度上成功完成连续结构域的划分。进行这项研究的目的是衡量这些预测方法在应用于全自动结构域分配方面的有用性。因此，通过计算正确分配的最高得分预测的数量来衡量每种结构域分配方法的灵敏度。我们使用目标序列预测的二级结构与由类结构拓扑同源性（CATH）指定的具有已知结构域边界的链的观察到的二级结构进行比对，实现了一种新的连续结构域识别方法。仅考虑最高预测结果，该方法将结构域编号正确分配给代表性链集的成功率为73.3%。对于24%的多结构域集（±20个残基），结构域编号和结构域边界位置的最高预测是正确的。已将这些结果与从评估的其他预测方法获得的结果联系起来进行了背景分析。

相似文献

Rapid protein domain assignment from amino acid sequence using predicted secondary structure.

Protein Sci. 2002 Dec;11(12):2814-24. doi: 10.1110/ps.0209902.

CATHEDRAL: a fast and effective algorithm to predict folds and domain boundaries from multidomain protein structures.

PLoS Comput Biol. 2007 Nov;3(11):e232. doi: 10.1371/journal.pcbi.0030232.

What are the baselines for protein fold recognition?

Bioinformatics. 2001 Jan;17(1):63-72. doi: 10.1093/bioinformatics/17.1.63.

Automated prediction of domain boundaries in CASP6 targets using Ginzu and RosettaDOM.

Proteins. 2005;61 Suppl 7:193-200. doi: 10.1002/prot.20737.

Fast prediction of protein domain boundaries using conserved local patterns.

J Mol Model. 2006 Sep;12(6):943-52. doi: 10.1007/s00894-006-0116-0. Epub 2006 Apr 29.

Protein secondary structure prediction with SPARROW.

J Chem Inf Model. 2012 Feb 27;52(2):545-56. doi: 10.1021/ci200321u. Epub 2012 Jan 23.

SnapDRAGON: a method to delineate protein structural domains from sequence data.

J Mol Biol. 2002 Feb 22;316(3):839-51. doi: 10.1006/jmbi.2001.5387.

An algorithm for prediction of structural elements in small proteins.

Pac Symp Biocomput. 1996:446-60.

Secondary structure-based assignment of the protein structural classes.

Amino Acids. 2008 Oct;35(3):551-64. doi: 10.1007/s00726-008-0080-3. Epub 2008 Apr 22.

Prediction of novel and analogous folds using fragment assembly and fold recognition.

Proteins. 2005;61 Suppl 7:143-151. doi: 10.1002/prot.20731.

引用本文的文献

Comparison of genes involved in brain development: insights into the organization and evolution of the telencephalic pallium.

Sci Rep. 2024 Mar 13;14(1):6102. doi: 10.1038/s41598-024-51964-1.

Res-Dom: predicting protein domain boundary from sequence using deep residual network and Bi-LSTM.

Bioinform Adv. 2022 Sep 1;2(1):vbac060. doi: 10.1093/bioadv/vbac060. eCollection 2022.

Modeling the Tertiary Structure of the Rift Valley Fever Virus L Protein.

Molecules. 2019 May 7;24(9):1768. doi: 10.3390/molecules24091768.

The Vaccinia virion: Filling the gap between atomic and ultrastructure.

PLoS Pathog. 2019 Jan 7;15(1):e1007508. doi: 10.1371/journal.ppat.1007508. eCollection 2019 Jan.

Integrative structure and functional anatomy of a nuclear pore complex.

Nature. 2018 Mar 22;555(7697):475-482. doi: 10.1038/nature26003. Epub 2018 Mar 14.

Protein crystallization: Eluding the bottleneck of X-ray crystallography.

AIMS Biophys. 2017;4(4):557-575. doi: 10.3934/biophy.2017.4.557. Epub 2017 Sep 26.

Combining Wet and Dry Lab Techniques to Guide the Crystallization of Large Coiled-coil Containing Proteins.

J Vis Exp. 2017 Jan 6(119):54886. doi: 10.3791/54886.

Biochemical characterization of PE_PGRS61 family protein of HRv reveals the binding ability to fibronectin.

Iran J Basic Med Sci. 2016 Oct;19(10):1105-1113.

Molecular Properties of Globin Channels and Pores: Role of Cholesterol in Ligand Binding and Movement.

Front Physiol. 2016 Sep 5;7:360. doi: 10.3389/fphys.2016.00360. eCollection 2016.

PAT: predictor for structured units and its application for the optimization of target molecules for the generation of synthetic antibodies.

BMC Bioinformatics. 2016 Apr 1;17:150. doi: 10.1186/s12859-016-1001-1.

本文引用的文献

Targeting novel folds for structural genomics.

Proteins. 2002 Jul 1;48(1):44-52. doi: 10.1002/prot.10129.

SnapDRAGON: a method to delineate protein structural domains from sequence data.

J Mol Biol. 2002 Feb 22;316(3):839-51. doi: 10.1006/jmbi.2001.5387.

CAFASP2: the second critical assessment of fully automated structure prediction methods.

Proteins. 2001;Suppl 5:171-83. doi: 10.1002/prot.10036.

Identification of homology in protein structure classification.

Nat Struct Biol. 2001 Nov;8(11):953-7. doi: 10.1038/nsb1101-953.

What are the baselines for protein fold recognition?

Bioinformatics. 2001 Jan;17(1):63-72. doi: 10.1093/bioinformatics/17.1.63.

Protein Information Resource: a community resource for expert annotation of protein data.

Nucleic Acids Res. 2001 Jan 1;29(1):29-32. doi: 10.1093/nar/29.1.29.

Domain size distributions can predict domain boundaries.

Bioinformatics. 2000 Jul;16(7):613-8. doi: 10.1093/bioinformatics/16.7.613.

The Pfam protein families database.

Nucleic Acids Res. 2000 Jan 1;28(1):263-6. doi: 10.1093/nar/28.1.263.

The Protein Data Bank.

Nucleic Acids Res. 2000 Jan 1;28(1):235-42. doi: 10.1093/nar/28.1.235.

SMART: a web-based tool for the study of genetically mobile domains.

Nucleic Acids Res. 2000 Jan 1;28(1):231-4. doi: 10.1093/nar/28.1.231.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

利用预测的二级结构从氨基酸序列中快速进行蛋白质结构域分配。

Rapid protein domain assignment from amino acid sequence using predicted secondary structure.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献