• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

从相关蛋白质序列集中自动生成一级序列模式。

Automatic generation of primary sequence patterns from sets of related protein sequences.

作者信息

Smith R F, Smith T F

机构信息

Department of Biostatistics, Dana-Farber Cancer Institute, Boston, MA 02115.

出版信息

Proc Natl Acad Sci U S A. 1990 Jan;87(1):118-22. doi: 10.1073/pnas.87.1.118.

DOI:10.1073/pnas.87.1.118
PMID:2296575
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC53211/
Abstract

We have developed a computer algorithm that can extract the pattern of conserved primary sequence elements common to all members of a homologous protein family. The method involves clustering the pairwise similarity scores among a set of related sequences to generate a binary dendrogram (tree). The tree is then reduced in a stepwise manner by progressively replacing the node connecting the two most similar termini by one common pattern until only a single common "root" pattern remains. A pattern is generated at a node by (i) performing a local optimal alignment on the sequence/pattern pair connected by the node with the use of an extended dynamic programming algorithm and then (ii) constructing a single common pattern from this alignment with a nested hierarchy of amino acid classes to identify the minimal inclusive amino acid class covering each paired set of elements in the alignment. Gaps within an alignment are created and/or extended using a "pay once" gap penalty rule, and gapped positions are converted into gap characters that function as 0 or 1 amino acid of any type during subsequent alignment. This method has been used to generate a library of covering patterns for homologous families in the National Biomedical Research Foundation/Protein Identification Resource protein sequence data base. We show that a covering pattern can be more diagnostic for sequence family membership than any of the individual sequences used to construct the pattern.

摘要

我们开发了一种计算机算法,该算法能够提取同源蛋白质家族所有成员共有的保守一级序列元件模式。该方法包括对一组相关序列之间的成对相似性得分进行聚类,以生成二元树状图(树)。然后通过逐步用一个共同模式替换连接两个最相似末端的节点,以逐步简化该树,直到仅剩下一个共同的“根”模式。在一个节点处生成模式的方法如下:(i) 使用扩展动态规划算法对由该节点连接的序列/模式对进行局部最优比对,然后 (ii) 根据该比对,通过氨基酸类别的嵌套层次结构构建一个单一的共同模式,以识别覆盖比对中每个配对元素集的最小包容性氨基酸类别。比对中的空位使用“一次付费”空位罚分规则来创建和/或扩展,并且在后续比对过程中,有间隙的位置会转换为间隙字符,其作用相当于任何类型的0或1个氨基酸。该方法已用于为国家生物医学研究基金会/蛋白质鉴定资源蛋白质序列数据库中的同源家族生成覆盖模式库。我们表明,对于序列家族成员身份,一个覆盖模式可能比用于构建该模式的任何单个序列更具诊断性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/735f/53211/591d0ef0323d/pnas01026-0141-a.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/735f/53211/591d0ef0323d/pnas01026-0141-a.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/735f/53211/591d0ef0323d/pnas01026-0141-a.jpg

相似文献

1
Automatic generation of primary sequence patterns from sets of related protein sequences.从相关蛋白质序列集中自动生成一级序列模式。
Proc Natl Acad Sci U S A. 1990 Jan;87(1):118-22. doi: 10.1073/pnas.87.1.118.
2
Hierarchical method to align large numbers of biological sequences.用于比对大量生物序列的分层方法。
Methods Enzymol. 1990;183:456-74. doi: 10.1016/0076-6879(90)83031-4.
3
A novel randomized iterative strategy for aligning multiple protein sequences.一种用于比对多条蛋白质序列的新型随机迭代策略。
Comput Appl Biosci. 1991 Oct;7(4):479-84. doi: 10.1093/bioinformatics/7.4.479.
4
A non-local gap-penalty for profile alignment.一种用于轮廓比对的非局部空位罚分。
Bull Math Biol. 1996 Jan;58(1):1-18. doi: 10.1007/BF02458279.
5
Gapped alignment of protein sequence motifs through Monte Carlo optimization of a hidden Markov model.通过隐马尔可夫模型的蒙特卡罗优化实现蛋白质序列基序的间隙比对。
BMC Bioinformatics. 2004 Oct 25;5:157. doi: 10.1186/1471-2105-5-157.
6
An integrated approach to the analysis and modeling of protein sequences and structures. III. A comparative study of sequence conservation in protein structural families using multiple structural alignments.一种蛋白质序列与结构分析及建模的综合方法。III. 使用多重结构比对对蛋白质结构家族中的序列保守性进行比较研究。
J Mol Biol. 2000 Aug 18;301(3):691-711. doi: 10.1006/jmbi.2000.3975.
7
Flexible protein sequence patterns. A sensitive method to detect weak structural similarities.灵活的蛋白质序列模式。一种检测微弱结构相似性的灵敏方法。
J Mol Biol. 1990 Mar 20;212(2):389-402. doi: 10.1016/0022-2836(90)90133-7.
8
Profile analysis: detection of distantly related proteins.轮廓分析:检测远亲相关蛋白。
Proc Natl Acad Sci U S A. 1987 Jul;84(13):4355-8. doi: 10.1073/pnas.84.13.4355.
9
Clustering of domains of functionally related enzymes in the interaction database PRECISE by the generation of primary sequence patterns.通过生成一级序列模式,在相互作用数据库PRECISE中对功能相关酶的结构域进行聚类。
J Mol Graph Model. 2006 May;24(6):426-33. doi: 10.1016/j.jmgm.2005.08.004. Epub 2005 Oct 10.
10
Parametric sequence comparisons.参数序列比较
Proc Natl Acad Sci U S A. 1992 Jul 1;89(13):6090-3. doi: 10.1073/pnas.89.13.6090.

引用本文的文献

1
Taxonomic quasi-primes: peptides charting lineage-specific adaptations and disease-relevant loci.分类学准素:描绘谱系特异性适应性和疾病相关基因座的肽段。
Protein Sci. 2025 Sep;34(9):e70241. doi: 10.1002/pro.70241.
2
Research progress of reduced amino acid alphabets in protein analysis and prediction.蛋白质分析与预测中简化氨基酸字母表的研究进展
Comput Struct Biotechnol J. 2022 Jul 4;20:3503-3510. doi: 10.1016/j.csbj.2022.07.001. eCollection 2022.
3
TGF-β Prodomain Alignments Reveal Unexpected Cysteine Conservation Consistent with Phylogenetic Predictions of Cross-Subfamily Heterodimerization.

本文引用的文献

1
Identification of common molecular subsequences.常见分子子序列的鉴定
J Mol Biol. 1981 Mar 25;147(1):195-7. doi: 10.1016/0022-2836(81)90087-5.
2
Efficient sequence alignment algorithms.高效的序列比对算法。
J Theor Biol. 1984 Jun 7;108(3):333-7. doi: 10.1016/s0022-5193(84)80037-5.
3
Rapid similarity searches of nucleic acid and protein data banks.核酸和蛋白质数据库的快速相似性搜索。
转化生长因子-β前结构域比对揭示了与跨亚家族异源二聚化系统发育预测一致的意外半胱氨酸保守性。
Genetics. 2020 Feb;214(2):447-465. doi: 10.1534/genetics.119.302255. Epub 2019 Dec 16.
4
Transgenic Analyses in Drosophila Reveal That mCORL1 Is Functionally Distinct from mCORL2 and dCORL.在果蝇中的转基因分析表明 mCORL1 在功能上与 mCORL2 和 dCORL 不同。
G3 (Bethesda). 2019 Nov 5;9(11):3781-3789. doi: 10.1534/g3.119.400647.
5
Tracking interspecies transmission and long-term evolution of an ancient retrovirus using the genomes of modern mammals.利用现代哺乳动物基因组追踪一种古老逆转录病毒的种间传播和长期进化。
Elife. 2016 Mar 8;5:e12704. doi: 10.7554/eLife.12704.
6
Statistical discovery of site inter-dependencies in sub-molecular hierarchical protein structuring.亚分子层次蛋白质结构中位点相互依赖性的统计发现
EURASIP J Bioinform Syst Biol. 2012 Jul 13;2012(1):8. doi: 10.1186/1687-4153-2012-8.
7
PhyloMap: an algorithm for visualizing relationships of large sequence data sets and its application to the influenza A virus genome.PhyloMap:一种可视化大型序列数据集关系的算法及其在甲型流感病毒基因组中的应用。
BMC Bioinformatics. 2011 Jun 20;12:248. doi: 10.1186/1471-2105-12-248.
8
Optimized ancestral state reconstruction using Sankoff parsimony.使用桑科夫简约法进行优化的祖先状态重建。
BMC Bioinformatics. 2009 Feb 7;10:51. doi: 10.1186/1471-2105-10-51.
9
A reduced amino acid alphabet for understanding and designing protein adaptation to mutation.用于理解和设计蛋白质对突变适应性的简化氨基酸字母表。
Eur Biophys J. 2007 Nov;36(8):1059-69. doi: 10.1007/s00249-007-0188-5. Epub 2007 Jun 13.
10
SCANMOT: searching for similar sequences using a simultaneous scan of multiple sequence motifs.SCANMOT:通过同时扫描多个序列基序来搜索相似序列。
Nucleic Acids Res. 2005 Jul 1;33(Web Server issue):W274-6. doi: 10.1093/nar/gki493.
Proc Natl Acad Sci U S A. 1983 Feb;80(3):726-30. doi: 10.1073/pnas.80.3.726.
4
Rapid searches for complex patterns in biological molecules.快速搜索生物分子中的复杂模式。
Nucleic Acids Res. 1984 Jan 11;12(1 Pt 1):263-80. doi: 10.1093/nar/12.1part1.263.
5
Covalent structure of bovine trypsinogen. The position of the remaining amides.牛胰蛋白酶原的共价结构。剩余酰胺的位置。
Biochem Biophys Res Commun. 1966 Aug 12;24(3):346-52. doi: 10.1016/0006-291x(66)90162-8.
6
The protein identification resource (PIR).蛋白质鉴定资源(PIR)。
Nucleic Acids Res. 1986 Jan 10;14(1):11-5. doi: 10.1093/nar/14.1.11.
7
The statistical distribution of nucleic acid similarities.核酸相似性的统计分布。
Nucleic Acids Res. 1985 Jan 25;13(2):645-56. doi: 10.1093/nar/13.2.645.
8
Determinants of a protein fold. Unique features of the globin amino acid sequences.蛋白质折叠的决定因素。珠蛋白氨基酸序列的独特特征。
J Mol Biol. 1987 Jul 5;196(1):199-216. doi: 10.1016/0022-2836(87)90521-3.
9
Prediction of protein secondary structure and active sites using the alignment of homologous sequences.利用同源序列比对预测蛋白质二级结构和活性位点。
J Mol Biol. 1987 Jun 20;195(4):957-61. doi: 10.1016/0022-2836(87)90501-8.
10
Knowledge-based prediction of protein structures and the design of novel molecules.基于知识的蛋白质结构预测与新型分子设计。
Nature. 1987;326(6111):347-52. doi: 10.1038/326347a0.