Suppr超能文献

一种结合蛋白质序列和结构信息的同源性鉴定方法。

A homology identification method that combines protein sequence and structure information.

作者信息

Yu L, White J V, Smith T F

机构信息

BioMolecular Engineering Research Center, College of Engineering, Boston University, Massachusetts 02215, USA.

出版信息

Protein Sci. 1998 Dec;7(12):2499-510. doi: 10.1002/pro.5560071203.

Abstract

A new method is presented for identifying distantly related homologous proteins that are unrecognizable by conventional sequence comparison methods. The method combines information about functionally conserved sequence patterns with information about structure context. This information is encoded in stochastic discrete state-space models (DSMs) that comprise a new family of hidden Markov models. The new models are called sequence-pattern-embedded DSMs (pDSMs). This method can identify distantly related protein family members with a high sensitivity and specificity. The method is illustrated with trypsin-like serine proteases and globins. The strategy for building pDSMs is presented. The method has been validated using carefully constructed positive and negative control sets. In addition to the ability to recognize remote homologs, pDSM sequence analysis predicts secondary structures with higher sensitivity, specificity, and Q3 accuracy than DSM analysis, which omits information about conserved sequence patterns. The identification of trypsin-like serine proteases in new genomes is discussed.

摘要

本文提出了一种新方法,用于识别传统序列比较方法无法识别的远缘同源蛋白。该方法将功能保守序列模式的信息与结构背景信息相结合。这些信息编码在随机离散状态空间模型(DSM)中,该模型构成了一个新的隐马尔可夫模型家族。新模型称为序列模式嵌入DSM(pDSM)。该方法能够以高灵敏度和特异性识别远缘相关的蛋白质家族成员。以类胰蛋白酶丝氨酸蛋白酶和球蛋白为例对该方法进行了说明。介绍了构建pDSM的策略。该方法已通过精心构建的阳性和阴性对照组进行了验证。除了识别远缘同源物的能力外,pDSM序列分析预测二级结构的灵敏度、特异性和Q3准确性均高于DSM分析,后者忽略了保守序列模式的信息。文中还讨论了在新基因组中识别类胰蛋白酶丝氨酸蛋白酶的问题。

相似文献

6
Identifying distantly related protein sequences.识别远缘相关的蛋白质序列。
Comput Appl Biosci. 1997 Aug;13(4):325-32. doi: 10.1093/bioinformatics/13.4.325.

引用本文的文献

2
Fungi and animals may share a common ancestor to nuclear receptors.真菌和动物可能拥有核受体的共同祖先。
Proc Natl Acad Sci U S A. 2006 May 2;103(18):7077-81. doi: 10.1073/pnas.0510080103. Epub 2006 Apr 24.

本文引用的文献

1
Predicting protein structure using hidden Markov models.使用隐马尔可夫模型预测蛋白质结构。
Proteins. 1997;Suppl 1:134-9. doi: 10.1002/(sici)1097-0134(1997)1+<134::aid-prot18>3.3.co;2-q.
3
Identifying distantly related protein sequences.识别远缘相关的蛋白质序列。
Comput Appl Biosci. 1997 Aug;13(4):325-32. doi: 10.1093/bioinformatics/13.4.325.
4
The complete genome sequence of Escherichia coli K-12.大肠杆菌K-12的全基因组序列。
Science. 1997 Sep 5;277(5331):1453-62. doi: 10.1126/science.277.5331.1453.
7
Overview of the yeast genome.酵母基因组概述。
Nature. 1997 May 29;387(6632 Suppl):7-65. doi: 10.1038/42755.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验