一种结合蛋白质序列和结构信息的同源性鉴定方法。

A homology identification method that combines protein sequence and structure information.

作者信息

Yu L, White J V, Smith T F

机构信息

BioMolecular Engineering Research Center, College of Engineering, Boston University, Massachusetts 02215, USA.

出版信息

Protein Sci. 1998 Dec;7(12):2499-510. doi: 10.1002/pro.5560071203.

DOI:10.1002/pro.5560071203

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2143896/

Abstract

A new method is presented for identifying distantly related homologous proteins that are unrecognizable by conventional sequence comparison methods. The method combines information about functionally conserved sequence patterns with information about structure context. This information is encoded in stochastic discrete state-space models (DSMs) that comprise a new family of hidden Markov models. The new models are called sequence-pattern-embedded DSMs (pDSMs). This method can identify distantly related protein family members with a high sensitivity and specificity. The method is illustrated with trypsin-like serine proteases and globins. The strategy for building pDSMs is presented. The method has been validated using carefully constructed positive and negative control sets. In addition to the ability to recognize remote homologs, pDSM sequence analysis predicts secondary structures with higher sensitivity, specificity, and Q3 accuracy than DSM analysis, which omits information about conserved sequence patterns. The identification of trypsin-like serine proteases in new genomes is discussed.

摘要

本文提出了一种新方法，用于识别传统序列比较方法无法识别的远缘同源蛋白。该方法将功能保守序列模式的信息与结构背景信息相结合。这些信息编码在随机离散状态空间模型（DSM）中，该模型构成了一个新的隐马尔可夫模型家族。新模型称为序列模式嵌入DSM（pDSM）。该方法能够以高灵敏度和特异性识别远缘相关的蛋白质家族成员。以类胰蛋白酶丝氨酸蛋白酶和球蛋白为例对该方法进行了说明。介绍了构建pDSM的策略。该方法已通过精心构建的阳性和阴性对照组进行了验证。除了识别远缘同源物的能力外，pDSM序列分析预测二级结构的灵敏度、特异性和Q3准确性均高于DSM分析，后者忽略了保守序列模式的信息。文中还讨论了在新基因组中识别类胰蛋白酶丝氨酸蛋白酶的问题。

相似文献

1

A homology identification method that combines protein sequence and structure information.一种结合蛋白质序列和结构信息的同源性鉴定方法。

Protein Sci. 1998 Dec;7(12):2499-510. doi: 10.1002/pro.5560071203.

2

An integrated approach to the analysis and modeling of protein sequences and structures. III. A comparative study of sequence conservation in protein structural families using multiple structural alignments.一种蛋白质序列与结构分析及建模的综合方法。III. 使用多重结构比对对蛋白质结构家族中的序列保守性进行比较研究。

J Mol Biol. 2000 Aug 18;301(3):691-711. doi: 10.1006/jmbi.2000.3975.

3

Identification of functionally conserved residues with the use of entropy-variability plots.利用熵变率图鉴定功能保守残基。

Proteins. 2003 Sep 1;52(4):544-52. doi: 10.1002/prot.10490.

4

MUSTANG: a multiple structural alignment algorithm.MUSTANG：一种多重结构比对算法。

Proteins. 2006 Aug 15;64(3):559-74. doi: 10.1002/prot.20921.

5

PROMALS: towards accurate multiple sequence alignments of distantly related proteins.PROMALS：用于实现远缘相关蛋白质准确多序列比对

Bioinformatics. 2007 Apr 1;23(7):802-8. doi: 10.1093/bioinformatics/btm017. Epub 2007 Jan 31.

6

Identifying distantly related protein sequences.识别远缘相关的蛋白质序列。

Comput Appl Biosci. 1997 Aug;13(4):325-32. doi: 10.1093/bioinformatics/13.4.325.

7

HomologyPlot: searching for homology to a family of proteins using a database of unique conserved patterns.同源性绘图：利用独特保守模式数据库搜索与蛋白质家族的同源性。

J Comput Aided Mol Des. 1994 Apr;8(2):193-210. doi: 10.1007/BF00119867.

8

A structural basis for sequence comparisons. An evaluation of scoring methodologies.序列比较的结构基础。评分方法的评估。

J Mol Biol. 1993 Oct 20;233(4):716-38. doi: 10.1006/jmbi.1993.1548.

9

Remote homology detection of integral membrane proteins using conserved sequence features.利用保守序列特征进行整合膜蛋白的远程同源性检测。

Proteins. 2008 May 15;71(3):1387-99. doi: 10.1002/prot.21825.

10

On single and multiple models of protein families for the detection of remote sequence relationships.用于检测远缘序列关系的蛋白质家族单模型和多模型研究

BMC Bioinformatics. 2006 Jan 31;7:48. doi: 10.1186/1471-2105-7-48.

引用本文的文献

1

Identification of an ideal-like fingerprint for a protein fold using overlapped conserved residues based approach.基于重叠保守残基方法鉴定蛋白质折叠的理想样指纹图谱。

Sci Rep. 2014 Jul 10;4:5643. doi: 10.1038/srep05643.

2

Fungi and animals may share a common ancestor to nuclear receptors.真菌和动物可能拥有核受体的共同祖先。

Proc Natl Acad Sci U S A. 2006 May 2;103(18):7077-81. doi: 10.1073/pnas.0510080103. Epub 2006 Apr 24.

3

Functional divergence of Kaposi's sarcoma-associated herpesvirus and related gamma-2 herpesvirus thymidine kinases: novel cytoplasmic phosphoproteins that alter cellular morphology and disrupt adhesion.卡波西肉瘤相关疱疹病毒及相关γ-2疱疹病毒胸苷激酶的功能差异：改变细胞形态并破坏黏附的新型细胞质磷蛋白

J Virol. 2005 Dec;79(23):14647-59. doi: 10.1128/JVI.79.23.14647-14659.2005.

4

Protein family comparison using statistical models and predicted structural information.使用统计模型和预测的结构信息进行蛋白质家族比较。

BMC Bioinformatics. 2004 Nov 25;5:183. doi: 10.1186/1471-2105-5-183.

5

Thirty-plus functional families from a single motif.来自单一基序的三十多个功能家族。

Protein Sci. 2000 Dec;9(12):2470-6. doi: 10.1110/ps.9.12.2470.

6

Comparative model building of interleukin-7 using interleukin-4 as a template: a structural hypothesis that displays atypical surface chemistry in helix D important for receptor activation.以白细胞介素-4为模板构建白细胞介素-7的比较模型：一种在D螺旋中显示出对受体激活很重要的非典型表面化学性质的结构假说。

Protein Sci. 2000 May;9(5):916-26. doi: 10.1110/ps.9.5.916.

本文引用的文献

1

Predicting protein structure using hidden Markov models.使用隐马尔可夫模型预测蛋白质结构。

Proteins. 1997;Suppl 1:134-9. doi: 10.1002/(sici)1097-0134(1997)1+<134::aid-prot18>3.3.co;2-q.

2

The complete genome sequence of the gram-positive bacterium Bacillus subtilis.革兰氏阳性细菌枯草芽孢杆菌的全基因组序列。

Nature. 1997 Nov 20;390(6657):249-56. doi: 10.1038/36786.

3

Identifying distantly related protein sequences.识别远缘相关的蛋白质序列。

Comput Appl Biosci. 1997 Aug;13(4):325-32. doi: 10.1093/bioinformatics/13.4.325.

4

The complete genome sequence of Escherichia coli K-12.大肠杆菌K-12的全基因组序列。

Science. 1997 Sep 5;277(5331):1453-62. doi: 10.1126/science.277.5331.1453.

5

Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.空位BLAST和位置特异性迭代BLAST：新一代蛋白质数据库搜索程序。

Nucleic Acids Res. 1997 Sep 1;25(17):3389-402. doi: 10.1093/nar/25.17.3389.

6

Pfam: a comprehensive database of protein domain families based on seed alignments.Pfam：一个基于种子比对的蛋白质结构域家族综合数据库。

Proteins. 1997 Jul;28(3):405-20. doi: 10.1002/(sici)1097-0134(199707)28:3<405::aid-prot10>3.0.co;2-l.

7

Overview of the yeast genome.酵母基因组概述。

Nature. 1997 May 29;387(6632 Suppl):7-65. doi: 10.1038/42755.

8

An evolutionary treasure: unification of a broad set of amidohydrolases related to urease.一项进化瑰宝：与脲酶相关的多种酰胺水解酶的统一

Proteins. 1997 May;28(1):72-82.

9

Extracting protein alignment models from the sequence database.从序列数据库中提取蛋白质比对模型。

Nucleic Acids Res. 1997 May 1;25(9):1665-77. doi: 10.1093/nar/25.9.1665.

10

Protein topology recognition from secondary structure sequences: application of the hidden Markov models to the alpha class proteins.从二级结构序列识别蛋白质拓扑结构：隐马尔可夫模型在α类蛋白质中的应用。

J Mol Biol. 1997 Mar 28;267(2):446-63. doi: 10.1006/jmbi.1996.0874.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验