使用模式作为种子进行蛋白质序列相似性搜索。

Protein sequence similarity searches using patterns as seeds.

作者信息

Zhang Z, Schäffer A A, Miller W, Madden T L, Lipman D J, Koonin E V, Altschul S F

机构信息

Department of Computer Science and Engineering, Pennsylvania State University, University Park, PA 16802, USA.

出版信息

Nucleic Acids Res. 1998 Sep 1;26(17):3986-90. doi: 10.1093/nar/26.17.3986.

DOI:10.1093/nar/26.17.3986

PMID:9705509

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC147803/

Abstract

Protein families often are characterized by conserved sequence patterns or motifs. A researcher frequently wishes to evaluate the significance of a specific pattern within a protein, or to exploit knowledge of known motifs to aid the recognition of greatly diverged but homologous family members. To assist in these efforts, the pattern-hit initiated BLAST (PHI-BLAST) program described here takes as input both a protein sequence and a pattern of interest that it contains. PHI-BLAST searches a protein database for other instances of the input pattern, and uses those found as seeds for the construction of local alignments to the query sequence. The random distribution of PHI-BLAST alignment scores is studied analytically and empirically. In many instances, the program is able to detect statistically significant similarity between homologous proteins that are not recognizably related using traditional single-pass database search methods. PHI-BLAST is applied to the analysis of CED4-like cell death regulators, HS90-type ATPase domains, archaeal tRNA nucleotidyltransferases and archaeal homologs of DnaG-type DNA primases.

摘要

蛋白质家族通常以保守的序列模式或基序为特征。研究人员经常希望评估蛋白质中特定模式的重要性，或者利用已知基序的知识来帮助识别差异很大但同源的家族成员。为了协助这些工作，本文描述的模式命中启动的BLAST（PHI-BLAST）程序将蛋白质序列和其中包含的感兴趣的模式作为输入。PHI-BLAST在蛋白质数据库中搜索输入模式的其他实例，并将找到的这些实例用作构建与查询序列的局部比对的种子。对PHI-BLAST比对分数的随机分布进行了分析和实证研究。在许多情况下，该程序能够检测到使用传统的单通道数据库搜索方法无法识别出明显相关性的同源蛋白质之间具有统计学意义的相似性。PHI-BLAST应用于CED4样细胞死亡调节因子、HS90型ATP酶结构域、古细菌tRNA核苷酸转移酶和DnaG型DNA引发酶的古细菌同源物的分析。

相似文献

Protein sequence similarity searches using patterns as seeds.使用模式作为种子进行蛋白质序列相似性搜索。

Nucleic Acids Res. 1998 Sep 1;26(17):3986-90. doi: 10.1093/nar/26.17.3986.

A new seed selection algorithm that maximizes local structural similarity in proteins.一种能使蛋白质局部结构相似性最大化的新种子选择算法。

Conf Proc IEEE Eng Med Biol Soc. 2006;2006:5822-5. doi: 10.1109/IEMBS.2006.259338.

SS-Wrapper: a package of wrapper applications for similarity searches on Linux clusters.SS-Wrapper：用于在Linux集群上进行相似性搜索的一组包装应用程序。

BMC Bioinformatics. 2004 Oct 28;5:171. doi: 10.1186/1471-2105-5-171.

Detection of homologous proteins by an intermediate sequence search.通过中间序列搜索检测同源蛋白。

Protein Sci. 2004 Jan;13(1):54-62. doi: 10.1110/ps.03335004.

PHOG-BLAST--a new generation tool for fast similarity search of protein families.PHOG-BLAST——用于蛋白质家族快速相似性搜索的新一代工具。

BMC Evol Biol. 2006 Jun 22;6:51. doi: 10.1186/1471-2148-6-51.

Gapped alignment of protein sequence motifs through Monte Carlo optimization of a hidden Markov model.通过隐马尔可夫模型的蒙特卡罗优化实现蛋白质序列基序的间隙比对。

BMC Bioinformatics. 2004 Oct 25;5:157. doi: 10.1186/1471-2105-5-157.

A comparison of position-specific score matrices based on sequence and structure alignments.基于序列和结构比对的特定位置得分矩阵比较。

Protein Sci. 2002 Feb;11(2):361-70. doi: 10.1110/ps.19902.

Gleaning non-trivial structural, functional and evolutionary information about proteins by iterative database searches.通过迭代数据库搜索获取有关蛋白质的重要结构、功能和进化信息。

J Mol Biol. 1999 Apr 16;287(5):1023-40. doi: 10.1006/jmbi.1999.2653.

Recent Hits Acquired by BLAST (ReHAB): a tool to identify new hits in sequence similarity searches.通过BLAST获取的近期命中结果（ReHAB）：一种在序列相似性搜索中识别新命中结果的工具。

BMC Bioinformatics. 2005 Feb 8;6:23. doi: 10.1186/1471-2105-6-23.

FASTA-SWAP and FASTA-PAT: pattern database searches using combinations of aligned amino acids, and a novel scoring theory.FASTA-SWAP和FASTA-PAT：使用比对氨基酸组合进行模式数据库搜索以及一种新颖的评分理论。

J Mol Biol. 1996 Jun 21;259(4):840-54. doi: 10.1006/jmbi.1996.0362.

引用本文的文献

Biosynthesis of Biphenomycin-like Macrocyclic Peptides by Formation and Cross-Linking of -Tyrosines.通过酪氨酸的形成和交联生物合成联苯霉素样大环肽。

J Am Chem Soc. 2025 Jul 9;147(27):23781-23796. doi: 10.1021/jacs.5c06044. Epub 2025 Jun 26.

A dynamic histone-based chromatin regulatory toolkit underpins genome and developmental evolution in an invertebrate clade.基于组蛋白的动态染色质调控工具包支撑着一个无脊椎动物类群的基因组和发育进化。

Genome Biol. 2025 Jun 10;26(1):160. doi: 10.1186/s13059-025-03626-2.

Proteomic profiling of small extracellular vesicles from bovine nucleus pulposus cells.牛髓核细胞小细胞外囊泡的蛋白质组学分析

PLoS One. 2025 May 29;20(5):e0324179. doi: 10.1371/journal.pone.0324179. eCollection 2025.

Large-scale transcriptome mining enables macrocyclic diversification and improved bioactivity of the stephanotic acid scaffold.大规模转录组挖掘实现了千金藤酸支架的大环多样化并提高了其生物活性。

Nat Commun. 2025 May 6;16(1):4198. doi: 10.1038/s41467-025-59428-4.

Biosynthesis of Macrocyclic Peptides by Formation and Crosslinking of -Tyrosines.通过γ-酪氨酸的形成和交联进行大环肽的生物合成。

bioRxiv. 2025 Apr 8:2025.04.04.647296. doi: 10.1101/2025.04.04.647296.

Characterizing phenotype variants of Cercosporidium personatum, causal agent of peanut late leaf spot disease, their morphology, genetics and metabolites.鉴定花生晚叶斑病病原菌花生尾孢菌的表型变异体、其形态、遗传学和代谢产物。

Sci Rep. 2025 Jan 9;15(1):1405. doi: 10.1038/s41598-025-85953-9.

A novel in-silico model explores LanM homologs among Hyphomicrobium spp.一种新型的计算机模拟模型探索了 Hypomicrobium spp 中的 LanM 同源物。

Commun Biol. 2024 Nov 20;7(1):1539. doi: 10.1038/s42003-024-07258-3.

Evolution of pH-sensitive transcription termination in during adaptation to repeated long-term starvation.在适应反复长期饥饿的过程中，在 pH 敏感转录终止中的进化。

Proc Natl Acad Sci U S A. 2024 Sep 24;121(39):e2405546121. doi: 10.1073/pnas.2405546121. Epub 2024 Sep 19.

Evolution of pH-sensitive transcription termination during adaptation to repeated long-term starvation.在适应反复长期饥饿过程中pH敏感型转录终止的演变

bioRxiv. 2024 Mar 1:2024.03.01.582989. doi: 10.1101/2024.03.01.582989.

Slc11 Synapomorphy: A Conserved 3D Framework Articulating Carrier Conformation Switch.Slc11 同源特征：一个保守的 3D 框架，连接载体构象开关。

Int J Mol Sci. 2023 Oct 11;24(20):15076. doi: 10.3390/ijms242015076.

本文引用的文献

Optimal sequence alignments.最佳序列比对。

Proc Natl Acad Sci U S A. 1983 Mar;80(5):1382-6. doi: 10.1073/pnas.80.5.1382.

Alignments without low-scoring regions.无低得分区域的比对。

J Comput Biol. 1998 Summer;5(2):197-210. doi: 10.1089/cmb.1998.5.197.

Generalized affine gap costs for protein sequence alignment.用于蛋白质序列比对的广义仿射空位罚分

Proteins. 1998 Jul 1;32(1):88-96.

Empirical statistical estimates for sequence similarity searches.序列相似性搜索的经验性统计估计。

J Mol Biol. 1998 Feb 13;276(1):71-84. doi: 10.1006/jmbi.1997.1525.

GenBank.基因银行

Nucleic Acids Res. 1998 Jan 1;26(1):1-7. doi: 10.1093/nar/26.1.1.

Cytochrome c and dATP-dependent formation of Apaf-1/caspase-9 complex initiates an apoptotic protease cascade.细胞色素c和dATP依赖的Apaf-1/半胱天冬酶-9复合物的形成启动凋亡蛋白酶级联反应。

Cell. 1997 Nov 14;91(4):479-89. doi: 10.1016/s0092-8674(00)80434-1.

The complete genome sequence of the hyperthermophilic, sulphate-reducing archaeon Archaeoglobus fulgidus.嗜热硫酸盐还原古菌富氏古球菌的全基因组序列。

Nature. 1997 Nov 27;390(6658):364-70. doi: 10.1038/37052.

Comparison of archaeal and bacterial genomes: computer analysis of protein sequences predicts novel functions and suggests a chimeric origin for the archaea.古菌与细菌基因组的比较：蛋白质序列的计算机分析预测新功能并暗示古菌的嵌合起源。

Mol Microbiol. 1997 Aug;25(4):619-37. doi: 10.1046/j.1365-2958.1997.4821861.x.

Role of CED-4 in the activation of CED-3.CED-4在CED-3激活过程中的作用。

Nature. 1997 Aug 21;388(6644):728-9. doi: 10.1038/41913.

Apaf-1, a human protein homologous to C. elegans CED-4, participates in cytochrome c-dependent activation of caspase-3.Apaf-1是一种与秀丽隐杆线虫CED-4同源的人类蛋白质，参与细胞色素c依赖性的半胱天冬酶-3激活过程。

Cell. 1997 Aug 8;90(3):405-13. doi: 10.1016/s0092-8674(00)80501-2.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验