蛋白质中保守片段的检测：使用比对模块对序列数据库进行迭代扫描。

Detection of conserved segments in proteins: iterative scanning of sequence databases with alignment blocks.

作者信息

Tatusov R L, Altschul S F, Koonin E V

机构信息

National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894.

出版信息

Proc Natl Acad Sci U S A. 1994 Dec 6;91(25):12091-5. doi: 10.1073/pnas.91.25.12091.

DOI:10.1073/pnas.91.25.12091

PMID:7991589

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC45382/

Abstract

We describe an approach to analyzing protein sequence databases that, starting from a single uncharacterized sequence or group of related sequences, generates blocks of conserved segments. The procedure involves iterative database scans with an evolving position-dependent weight matrix constructed from a coevolving set of aligned conserved segments. For each iteration, the expected distribution of matrix scores under a random model is used to set a cutoff score for the inclusion of a segment in the next iteration. This cutoff may be calculated to allow the chance inclusion of either a fixed number or a fixed proportion of false positive segments. With sufficiently high cutoff scores, the procedure converged for all alignment blocks studied, with varying numbers of iterations required. Different methods for calculating weight matrices from alignment blocks were compared. The most effective of those tested was a logarithm-of-odds, Bayesian-based approach that used prior residue probabilities calculated from a mixture of Dirichlet distributions. The procedure described was used to detect novel conserved motifs of potential biological importance.

摘要

我们描述了一种分析蛋白质序列数据库的方法，该方法从单个未表征的序列或一组相关序列开始，生成保守片段块。该过程涉及使用从一组共同进化的比对保守片段构建的不断演变的位置依赖权重矩阵进行迭代数据库扫描。对于每次迭代，使用随机模型下矩阵分数的预期分布来设置截止分数，以便在下一次迭代中纳入一个片段。可以计算此截止分数，以允许以固定数量或固定比例随机纳入假阳性片段。在足够高的截止分数下，该过程对于所有研究的比对块都收敛，所需的迭代次数各不相同。比较了从比对块计算权重矩阵的不同方法。测试中最有效的方法是基于贝叶斯的对数优势方法，该方法使用从狄利克雷分布混合计算出的先验残基概率。所描述的过程用于检测具有潜在生物学重要性的新型保守基序。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/04f9/45382/89fbad838902/pnas01147-0333-a.jpg

相似文献

Detection of conserved segments in proteins: iterative scanning of sequence databases with alignment blocks.蛋白质中保守片段的检测：使用比对模块对序列数据库进行迭代扫描。

Proc Natl Acad Sci U S A. 1994 Dec 6;91(25):12091-5. doi: 10.1073/pnas.91.25.12091.

Blocks database and its applications.区块链数据库及其应用。

Methods Enzymol. 1996;266:88-105. doi: 10.1016/s0076-6879(96)66008-x.

Protein family classification based on searching a database of blocks.基于搜索模块数据库的蛋白质家族分类。

Genomics. 1994 Jan 1;19(1):97-107. doi: 10.1006/geno.1994.1018.

Computer analysis of bacterial haloacid dehalogenases defines a large superfamily of hydrolases with diverse specificity. Application of an iterative approach to database search.细菌卤代酸脱卤酶的计算机分析确定了一个具有多种特异性的水解酶大型超家族。迭代方法在数据库搜索中的应用。

J Mol Biol. 1994 Nov 18;244(1):125-32. doi: 10.1006/jmbi.1994.1711.

Ancient conserved regions in new gene sequences and the protein databases.新基因序列和蛋白质数据库中的古老保守区域。

Science. 1993 Mar 19;259(5102):1711-6. doi: 10.1126/science.8456298.

Toprim--a conserved catalytic domain in type IA and II topoisomerases, DnaG-type primases, OLD family nucleases and RecR proteins.Toprim——IA型和II型拓扑异构酶、DnaG型引发酶、OLD家族核酸酶和RecR蛋白中的一个保守催化结构域。

Nucleic Acids Res. 1998 Sep 15;26(18):4205-13. doi: 10.1093/nar/26.18.4205.

Superior performance in protein homology detection with the Blocks Database servers.使用Blocks数据库服务器在蛋白质同源性检测方面表现卓越。

Nucleic Acids Res. 1998 Jan 1;26(1):309-12. doi: 10.1093/nar/26.1.309.

An integrated approach to the analysis and modeling of protein sequences and structures. III. A comparative study of sequence conservation in protein structural families using multiple structural alignments.一种蛋白质序列与结构分析及建模的综合方法。III. 使用多重结构比对对蛋白质结构家族中的序列保守性进行比较研究。

J Mol Biol. 2000 Aug 18;301(3):691-711. doi: 10.1006/jmbi.2000.3975.

The EMOTIF database.EMOTIF数据库。

Nucleic Acids Res. 2001 Jan 1;29(1):202-4. doi: 10.1093/nar/29.1.202.

Discovering empirically conserved amino acid substitution groups in databases of protein families.在蛋白质家族数据库中实证发现保守氨基酸替代基团。

Proc Int Conf Intell Syst Mol Biol. 1996;4:230-40.

引用本文的文献

Position-Specific Enrichment Ratio Matrix scores predict antibody variant properties from deep sequencing data.位置特异性富集比矩阵评分可根据深度测序数据预测抗体变体特性。

Bioinformatics. 2023 Sep 2;39(9). doi: 10.1093/bioinformatics/btad446.

Immunolyser: A web-based computational pipeline for analysing and mining immunopeptidomic data.免疫分析器：一种基于网络的用于分析和挖掘免疫肽组学数据的计算流程。

Comput Struct Biotechnol J. 2023 Feb 18;21:1678-1687. doi: 10.1016/j.csbj.2023.02.033. eCollection 2023.

Inter-paralog amino acid inversion events in large phylogenies of duplicated proteins.在复制蛋白质的大型系统发育中发生的基因间旁系同源氨基酸倒位事件。

PLoS Comput Biol. 2022 Apr 4;18(4):e1010016. doi: 10.1371/journal.pcbi.1010016. eCollection 2022 Apr.

Protein domain identification methods and online resources.蛋白质结构域鉴定方法及在线资源。

Comput Struct Biotechnol J. 2021 Feb 2;19:1145-1153. doi: 10.1016/j.csbj.2021.01.041. eCollection 2021.

A simple strategy to enhance the speed of protein secondary structure prediction without sacrificing accuracy.一种不牺牲准确性、提高蛋白质二级结构预测速度的简单策略。

PLoS One. 2020 Jun 30;15(6):e0235153. doi: 10.1371/journal.pone.0235153. eCollection 2020.

Mapping Biological Networks from Quantitative Data-Independent Acquisition Mass Spectrometry: Data to Knowledge Pipelines.从数据非依赖采集质谱法映射生物网络：从数据到知识的流程

Methods Mol Biol. 2017;1558:395-413. doi: 10.1007/978-1-4939-6783-4_19.

Prediction and analysis of canonical EF hand loop and qualitative estimation of Ca²⁺ binding affinity.典型EF手型环的预测与分析及Ca²⁺结合亲和力的定性评估

PLoS One. 2014 Apr 23;9(4):e96202. doi: 10.1371/journal.pone.0096202. eCollection 2014.

Inferring homologous protein-protein interactions through pair position specific scoring matrix.通过对序列位置特异性打分矩阵进行同源蛋白-蛋白相互作用推断。

BMC Bioinformatics. 2013;14 Suppl 2(Suppl 2):S11. doi: 10.1186/1471-2105-14-S2-S11. Epub 2013 Jan 21.

Domain enhanced lookup time accelerated BLAST.基于域名的快速检索 BLAST。

Biol Direct. 2012 Apr 17;7:12. doi: 10.1186/1745-6150-7-12.

An assessment of substitution scores for protein profile-profile comparison.蛋白质图谱-图谱比较替代评分评估。

Bioinformatics. 2011 Dec 15;27(24):3356-63. doi: 10.1093/bioinformatics/btr565. Epub 2011 Oct 13.

本文引用的文献

Identification of protein coding regions by database similarity search.通过数据库相似性搜索鉴定蛋白质编码区域。

Nat Genet. 1993 Mar;3(3):266-72. doi: 10.1038/ng0393-266.

Biochemical interaction of the Escherichia coli RecF, RecO, and RecR proteins with RecA protein and single-stranded DNA binding protein.大肠杆菌RecF、RecO和RecR蛋白与RecA蛋白及单链DNA结合蛋白的生化相互作用。

Proc Natl Acad Sci U S A. 1993 May 1;90(9):3875-9. doi: 10.1073/pnas.90.9.3875.

A fast, sensitive pattern-matching approach for protein sequences.一种用于蛋白质序列的快速、灵敏的模式匹配方法。

Comput Appl Biosci. 1993 Apr;9(2):183-9. doi: 10.1093/bioinformatics/9.2.183.

Compilation, alignment, and phylogenetic relationships of DNA polymerases.DNA聚合酶的汇编、比对及系统发育关系

Nucleic Acids Res. 1993 Feb 25;21(4):787-802. doi: 10.1093/nar/21.4.787.

Purification and properties of the RecR protein from Bacillus subtilis 168.枯草芽孢杆菌168中RecR蛋白的纯化及特性

J Biol Chem. 1993 Jan 15;268(2):1424-9.

Reverse gyrase: a helicase-like domain and a type I topoisomerase in the same polypeptide.反向回旋酶：同一多肽链中具有一个解旋酶样结构域和一个I型拓扑异构酶。

Proc Natl Acad Sci U S A. 1993 May 15;90(10):4753-7. doi: 10.1073/pnas.90.10.4753.

Design of a discriminating fingerprint for G-protein-coupled receptors.G蛋白偶联受体鉴别指纹图谱的设计

Protein Eng. 1993 Feb;6(2):167-76. doi: 10.1093/protein/6.2.167.

The PROSITE dictionary of sites and patterns in proteins, its current status.蛋白质位点与模式的PROSITE字典及其当前状态。

Nucleic Acids Res. 1993 Jul 1;21(13):3097-103. doi: 10.1093/nar/21.13.3097.

The SWISS-PROT protein sequence data bank, recent developments.SWISS-PROT蛋白质序列数据库，最新进展。

Nucleic Acids Res. 1993 Jul 1;21(13):3093-6. doi: 10.1093/nar/21.13.3093.

Yeast chromosome III: new gene functions.酵母三号染色体：新的基因功能。

EMBO J. 1994 Feb 1;13(3):493-503. doi: 10.1002/j.1460-2075.1994.tb06287.x.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

蛋白质中保守片段的检测：使用比对模块对序列数据库进行迭代扫描。

Detection of conserved segments in proteins: iterative scanning of sequence databases with alignment blocks.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献