• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

定义功能性蛋白质序列模式的相似性阈值:信号肽切割位点。

Defining a similarity threshold for a functional protein sequence pattern: the signal peptide cleavage site.

作者信息

Nielsen H, Engelbrecht J, von Heijne G, Brunak S

机构信息

Center for Biological Sequence Analysis, Department of Physical Chemistry, The Technical University of Denmark, Lyngby.

出版信息

Proteins. 1996 Feb;24(2):165-77. doi: 10.1002/(SICI)1097-0134(199602)24:2<165::AID-PROT4>3.0.CO;2-I.

DOI:10.1002/(SICI)1097-0134(199602)24:2<165::AID-PROT4>3.0.CO;2-I
PMID:8820484
Abstract

When preparing data sets of amino acid or nucleotide sequences it is necessary to exclude redundant or homologous sequences in order to avoid overestimating the predictive performance of an algorithm. For some time methods for doing this have been available in the area of protein structure prediction. We have developed a similar procedure based on pair-wise alignments for sequences with functional sites. We show how a correlation coefficient between sequence similarity and functional homology can be used to compare the efficiency of different similarity measures and choose a nonarbitrary threshold value for excluding redundant sequences. The impact of the choice of scoring matrix used in the alignments is examined. We demonstrate that the parameter determining the quality of the correlation is the relative entropy of the matrix, rather than the assumed (PAM or identity) substitution mode. Results are presented for the case of prediction of cleavage sites in signal peptides. By inspection of the false positives, several errors in the database were found. The procedure presented may be used as a general outline for finding a problem-specific similarity measure and threshold value for analysis of other functional amino acid or nucleotide sequence patterns.

摘要

在准备氨基酸或核苷酸序列数据集时,有必要排除冗余或同源序列,以避免高估算法的预测性能。一段时间以来,蛋白质结构预测领域已有进行此操作的方法。我们基于功能位点序列的两两比对开发了类似的程序。我们展示了如何使用序列相似性与功能同源性之间的相关系数来比较不同相似性度量的效率,并为排除冗余序列选择一个非任意的阈值。研究了比对中使用的评分矩阵选择的影响。我们证明,决定相关性质量的参数是矩阵的相对熵,而非假定的(PAM或同一性)替换模式。给出了信号肽切割位点预测情况的结果。通过检查假阳性,发现了数据库中的一些错误。所提出的程序可作为一个通用框架,用于找到针对特定问题的相似性度量和阈值,以分析其他功能性氨基酸或核苷酸序列模式。

相似文献

1
Defining a similarity threshold for a functional protein sequence pattern: the signal peptide cleavage site.定义功能性蛋白质序列模式的相似性阈值:信号肽切割位点。
Proteins. 1996 Feb;24(2):165-77. doi: 10.1002/(SICI)1097-0134(199602)24:2<165::AID-PROT4>3.0.CO;2-I.
2
An integrated approach to the analysis and modeling of protein sequences and structures. III. A comparative study of sequence conservation in protein structural families using multiple structural alignments.一种蛋白质序列与结构分析及建模的综合方法。III. 使用多重结构比对对蛋白质结构家族中的序列保守性进行比较研究。
J Mol Biol. 2000 Aug 18;301(3):691-711. doi: 10.1006/jmbi.2000.3975.
3
Analysis and prediction of functional sub-types from protein sequence alignments.基于蛋白质序列比对的功能亚类型分析与预测。
J Mol Biol. 2000 Oct 13;303(1):61-76. doi: 10.1006/jmbi.2000.4036.
4
From analysis of protein structural alignments toward a novel approach to align protein sequences.从蛋白质结构比对分析到一种比对蛋白质序列的新方法。
Proteins. 2004 Feb 15;54(3):569-82. doi: 10.1002/prot.10503.
5
A 3D-1D substitution matrix for protein fold recognition that includes predicted secondary structure of the sequence.一种用于蛋白质折叠识别的3D-1D替换矩阵,其包含序列的预测二级结构。
J Mol Biol. 1997 Apr 11;267(4):1026-38. doi: 10.1006/jmbi.1997.0924.
6
Enriching the sequence substitution matrix by structural information.通过结构信息丰富序列替换矩阵。
Proteins. 2004 Jan 1;54(1):41-8. doi: 10.1002/prot.10474.
7
Use of residue pairs in protein sequence-sequence and sequence-structure alignments.残基对在蛋白质序列-序列和序列-结构比对中的应用。
Protein Sci. 2000 Aug;9(8):1576-88. doi: 10.1110/ps.9.8.1576.
8
On the quality of tree-based protein classification.论基于树的蛋白质分类的质量。
Bioinformatics. 2005 May 1;21(9):1876-90. doi: 10.1093/bioinformatics/bti244. Epub 2005 Jan 12.
9
NdPASA: a novel pairwise protein sequence alignment algorithm that incorporates neighbor-dependent amino acid propensities.NdPASA:一种整合了邻域依赖氨基酸倾向的新型双序列蛋白质序列比对算法。
Proteins. 2005 Feb 15;58(3):628-37. doi: 10.1002/prot.20359.
10
Protein structure prediction based on sequence similarity.基于序列相似性的蛋白质结构预测。
Methods Mol Biol. 2009;569:129-56. doi: 10.1007/978-1-59745-524-4_7.

引用本文的文献

1
Toward Understanding the Mechanism of Client-Selective Small Molecule Inhibitors of the Sec61 Translocon.深入了解Sec61转运体的客户选择性小分子抑制剂的作用机制
J Mol Recognit. 2025 Jan;38(1):e3108. doi: 10.1002/jmr.3108. Epub 2024 Oct 12.
2
SpanSeq: similarity-based sequence data splitting method for improved development and assessment of deep learning projects.SpanSeq:基于相似度的序列数据分割方法,用于改进深度学习项目的开发与评估。
NAR Genom Bioinform. 2024 Aug 16;6(3):lqae106. doi: 10.1093/nargab/lqae106. eCollection 2024 Sep.
3
Construction of a Lectin-Glycan Interaction Network from Enterohemorrhagic Strains by Multi-omics Analysis.
基于多组学分析构建肠出血性菌株的凝集素-糖相互作用网络。
Int J Mol Sci. 2020 Apr 12;21(8):2681. doi: 10.3390/ijms21082681.
4
A Brief History of Protein Sorting Prediction.蛋白质分拣预测简史。
Protein J. 2019 Jun;38(3):200-216. doi: 10.1007/s10930-019-09838-3.
5
A Novel Proteome Microarray Discriminates Targets of Human Antibody Reactivity following Oral Vaccination and Experimental Challenge.一种新型蛋白质组微阵列可区分口服疫苗接种和实验性挑战后人体抗体反应的靶标。
mSphere. 2018 Aug 1;3(4):e00260-18. doi: 10.1128/mSphere.00260-18.
6
Discovery of leucokinin-like neuropeptides that modulate a specific parameter of feeding motor programs in the molluscan model, .发现了亮氨酸脑啡肽样神经肽,它们可调节软体动物模型中特定摄食运动程序的参数。
J Biol Chem. 2017 Nov 17;292(46):18775-18789. doi: 10.1074/jbc.M117.795450. Epub 2017 Sep 18.
7
EuLoc: a web-server for accurately predict protein subcellular localization in eukaryotes by incorporating various features of sequence segments into the general form of Chou's PseAAC.EuLoc:一个通过将序列片段的各种特征纳入到 Chou 的 PseAAC 的通用形式中,从而准确预测真核生物蛋白质亚细胞定位的网络服务器。
J Comput Aided Mol Des. 2013 Jan;27(1):91-103. doi: 10.1007/s10822-012-9628-0. Epub 2013 Jan 3.
8
Computational comparative study of tuberculosis proteomes using a model learned from signal peptide structures.基于信号肽结构模型的结核分枝杆菌蛋白质组的计算比较研究。
PLoS One. 2012;7(4):e35018. doi: 10.1371/journal.pone.0035018. Epub 2012 Apr 9.
9
Characterization and prediction of protein nucleolar localization sequences.蛋白质核仁定位序列的特征化和预测。
Nucleic Acids Res. 2010 Nov;38(21):7388-99. doi: 10.1093/nar/gkq653. Epub 2010 Jul 26.
10
A comprehensive assessment of N-terminal signal peptides prediction methods.N 端信号肽预测方法的综合评估。
BMC Bioinformatics. 2009 Dec 3;10 Suppl 15(Suppl 15):S2. doi: 10.1186/1471-2105-10-S15-S2.