在蛋白质家族和超家族中识别基于特性的序列基序：应用于与DNase-1相关的核酸内切酶。

Identifying property based sequence motifs in protein families and superfamilies: application to DNase-1 related endonucleases.

作者信息

Mathura Venkatarajan S, Schein Catherine H, Braun Werner

机构信息

Sealy Center for Structural Biology, Department of Human Biological Chemistry and Genetics, University of Texas Medical Branch, Galveston, TX 77555-1157, USA.

出版信息

Bioinformatics. 2003 Jul 22;19(11):1381-90. doi: 10.1093/bioinformatics/btg164.

DOI:10.1093/bioinformatics/btg164

PMID:12874050

Abstract

MOTIVATION

Identification of short conserved sequence motifs common to a protein family or superfamily can be more useful than overall sequence similarity in suggesting the function of novel gene products. Locating motifs still requires expert knowledge, as automated methods using stringent criteria may not differentiate subtle similarities from statistical noise.

RESULTS

We have developed a novel automatic method, based on patterns of conservation of 237 physical-chemical properties of amino acids in aligned protein sequences, to find related motifs in proteins with little or no overall sequence similarity. As an application, our web-server MASIA identified 12 property-based motifs in the apurinic/apyrimidinic endonuclease (APE) family of DNA-repair enzymes of the DNase-I superfamily. Searching with these motifs located distantly related representatives of the DNase-I superfamily, such as Inositol 5'-polyphosphate phosphatases in the ASTRAL40 database, using a Bayesian scoring function. Other proteins containing APE motifs had no overall sequence or structural similarity. However, all were phosphatases and/or had a metal ion binding active site. Thus our automated method can identify discrete elements in distantly related proteins that define local structure and aspects of function. We anticipate that our method will complement existing ones to functionally annotate novel protein sequences from genomic projects.

AVAILABILITY

MASIA WEB site: http://www.scsb.utmb.edu/masia/masia.html

SUPPLEMENTARY INFORMATION

The dendrogram of 42 APE sequences used to derive motifs is available on http://www.scsb.utmb.edu/comp_biol.html/DNA_repair/publication.html

摘要

动机

识别蛋白质家族或超家族共有的短保守序列基序，在推断新基因产物的功能方面可能比整体序列相似性更有用。定位基序仍需要专业知识，因为使用严格标准的自动化方法可能无法区分细微的相似性与统计噪声。

结果

我们开发了一种新颖的自动方法，基于比对蛋白质序列中氨基酸的237种物理化学性质的保守模式，以在整体序列相似性很少或没有的蛋白质中找到相关基序。作为应用，我们的网络服务器MASIA在DNase-I超家族的DNA修复酶的脱嘌呤/脱嘧啶内切核酸酶（APE）家族中鉴定出12个基于性质的基序。使用贝叶斯评分函数，用这些基序搜索DNase-I超家族中远距离相关的代表，如ASTRAL40数据库中的肌醇5'-多磷酸磷酸酶。其他含有APE基序的蛋白质没有整体序列或结构相似性。然而，所有这些都是磷酸酶和/或具有金属离子结合活性位点。因此，我们的自动化方法可以识别远距离相关蛋白质中的离散元件，这些元件定义了局部结构和功能方面。我们预计我们的方法将补充现有方法，以便从基因组计划中对新的蛋白质序列进行功能注释。

可用性

MASIA网站：http://www.scsb.utmb.edu/masia/masia.html

补充信息

用于推导基序的42个APE序列的树状图可在http://www.scsb.utmb.edu/comp_biol.html/DNA_repair/publication.html上获得

相似文献

Identifying property based sequence motifs in protein families and superfamilies: application to DNase-1 related endonucleases.在蛋白质家族和超家族中识别基于特性的序列基序：应用于与DNase-1相关的核酸内切酶。

Bioinformatics. 2003 Jul 22;19(11):1381-90. doi: 10.1093/bioinformatics/btg164.

Total sequence decomposition distinguishes functional modules, "molegos" in apurinic/apyrimidinic endonucleases.全序列分解可区分脱嘌呤/脱嘧啶内切核酸酶中的功能模块“molegos”。

BMC Bioinformatics. 2002 Nov 25;3:37. doi: 10.1186/1471-2105-3-37.

MASIA: recognition of common patterns and properties in multiple aligned protein sequences.MASIA：识别多个比对蛋白质序列中的常见模式和特性。

Bioinformatics. 2000 Oct;16(10):950-1. doi: 10.1093/bioinformatics/16.10.950.

SCANMOT: searching for similar sequences using a simultaneous scan of multiple sequence motifs.SCANMOT：通过同时扫描多个序列基序来搜索相似序列。

Nucleic Acids Res. 2005 Jul 1;33(Web Server issue):W274-6. doi: 10.1093/nar/gki493.

PASS2: an automated database of protein alignments organised as structural superfamilies.PASS2：一个以结构超家族形式组织的蛋白质比对自动化数据库。

BMC Bioinformatics. 2004 Apr 2;5:35. doi: 10.1186/1471-2105-5-35.

Gapped alignment of protein sequence motifs through Monte Carlo optimization of a hidden Markov model.通过隐马尔可夫模型的蒙特卡罗优化实现蛋白质序列基序的间隙比对。

BMC Bioinformatics. 2004 Oct 25;5:157. doi: 10.1186/1471-2105-5-157.

3MOTIF: visualizing conserved protein sequence motifs in the protein structure database.3MOTIF：在蛋白质结构数据库中可视化保守蛋白质序列基序

Bioinformatics. 2003 Mar 1;19(4):541-2. doi: 10.1093/bioinformatics/btf862.

PROMALS: towards accurate multiple sequence alignments of distantly related proteins.PROMALS：用于实现远缘相关蛋白质准确多序列比对

Bioinformatics. 2007 Apr 1;23(7):802-8. doi: 10.1093/bioinformatics/btm017. Epub 2007 Jan 31.

Rapid detection, classification and accurate alignment of up to a million or more related protein sequences.快速检测、分类和精确比对多达上百万条甚至更多的相关蛋白质序列。

Bioinformatics. 2009 Aug 1;25(15):1869-75. doi: 10.1093/bioinformatics/btp342. Epub 2009 Jun 8.

FunClust: a web server for the identification of structural motifs in a set of non-homologous protein structures.FunClust：一个用于识别一组非同源蛋白质结构中结构基序的网络服务器。

BMC Bioinformatics. 2008 Mar 26;9 Suppl 2(Suppl 2):S2. doi: 10.1186/1471-2105-9-S2-S2.

引用本文的文献

The updated Structural Database of Allergenic Proteins (SDAP 2.0) provides 3D models for allergens and incorporated bioinformatics tools.更新后的变应原蛋白结构数据库（SDAP 2.0）提供了变应原的三维模型，并纳入了生物信息学工具。

J Allergy Clin Immunol Glob. 2023 Aug 11;2(4):100162. doi: 10.1016/j.jacig.2023.100162. eCollection 2023 Nov.

Methods Mol Biol. 2024;2717:269-284. doi: 10.1007/978-1-0716-3453-0_18.

Still SDAPing Along: 20 Years of the Structural Database of Allergenic Proteins.仍在稳步前行：变应原蛋白结构数据库20年

Front Allergy. 2022 Mar 22;3:863172. doi: 10.3389/falgy.2022.863172. eCollection 2022.

Distinguishing allergens from non-allergenic homologues using Physical-Chemical Property (PCP) motifs.使用物理化学性质 (PCP) 基序区分过敏原和非过敏原同源物。

Mol Immunol. 2018 Jul;99:1-8. doi: 10.1016/j.molimm.2018.03.022. Epub 2018 Apr 6.

Functional classification of protein toxins as a basis for bioinformatic screening.蛋白质毒素的功能分类是生物信息学筛选的基础。

Sci Rep. 2017 Oct 24;7(1):13940. doi: 10.1038/s41598-017-13957-1.

Sequence-motif detection of NAD(P)-binding proteins: discovery of a unique antibacterial drug target.NAD(P)结合蛋白的序列基序检测：发现一种独特的抗菌药物靶点。

Sci Rep. 2014 Sep 25;4:6471. doi: 10.1038/srep06471.

AllerTOP v.2--a server for in silico prediction of allergens.AllerTOP v.2——一款用于过敏原计算机模拟预测的服务器。

J Mol Model. 2014 Jun;20(6):2278. doi: 10.1007/s00894-014-2278-5. Epub 2014 May 31.

Base of the measles virus fusion trimer head receives the signal that triggers membrane fusion.麻疹病毒融合三聚体头部的基部接收触发膜融合的信号。

J Biol Chem. 2012 Sep 21;287(39):33026-35. doi: 10.1074/jbc.M112.373308. Epub 2012 Aug 2.

Engineering proteins with enhanced mechanical stability by force-specific sequence motifs.通过力特异性序列基序工程具有增强机械稳定性的蛋白质。

Proteins. 2012 May;80(5):1308-15. doi: 10.1002/prot.24027. Epub 2012 Feb 10.

AllerML: markup language for allergens.AllerML：过敏原标记语言。

Regul Toxicol Pharmacol. 2011 Jun;60(1):151-60. doi: 10.1016/j.yrtph.2011.03.006. Epub 2011 Mar 21.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

在蛋白质家族和超家族中识别基于特性的序列基序：应用于与DNase-1相关的核酸内切酶。

Identifying property based sequence motifs in protein families and superfamilies: application to DNase-1 related endonucleases.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY

SUPPLEMENTARY INFORMATION

动机

结果

可用性

补充信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献