• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用比较基因组学和蛋白质家族间的统计学过度代表性预测真核生物蛋白质组中的C端基序

C-terminal motif prediction in eukaryotic proteomes using comparative genomics and statistical over-representation across protein families.

作者信息

Austin Ryan S, Provart Nicholas J, Cutler Sean R

机构信息

Department of Cell & Systems Biology, University of Toronto, Toronto, ON, Canada.

出版信息

BMC Genomics. 2007 Jun 26;8:191. doi: 10.1186/1471-2164-8-191.

DOI:10.1186/1471-2164-8-191
PMID:17594486
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1929074/
Abstract

BACKGROUND

The carboxy termini of proteins are a frequent site of activity for a variety of biologically important functions, ranging from post-translational modification to protein targeting. Several short peptide motifs involved in protein sorting roles and dependent upon their proximity to the C-terminus for proper function have already been characterized. As a limited number of such motifs have been identified, the potential exists for genome-wide statistical analysis and comparative genomics to reveal novel peptide signatures functioning in a C-terminal dependent manner. We have applied a novel methodology to the prediction of C-terminal-anchored peptide motifs involving a simple z-statistic and several techniques for improving the signal-to-noise ratio.

RESULTS

We examined the statistical over-representation of position-specific C-terminal tripeptides in 7 eukaryotic proteomes. Sequence randomization models and simple-sequence masking were applied to the successful reduction of background noise. Similarly, as C-terminal homology among members of large protein families may artificially inflate tripeptide counts in an irrelevant and obfuscating manner, gene-family clustering was performed prior to the analysis in order to assess tripeptide over-representation across protein families as opposed to across all proteins. Finally, comparative genomics was used to identify tripeptides significantly occurring in multiple species. This approach has been able to predict, to our knowledge, all C-terminally anchored targeting motifs present in the literature. These include the PTS1 peroxisomal targeting signal (SKL*), the ER-retention signal (K/HDEL*), the ER-retrieval signal for membrane bound proteins (KKxx*), the prenylation signal (CC*) and the CaaX box prenylation motif. In addition to a high statistical over-representation of these known motifs, a collection of significant tripeptides with a high propensity for biological function exists between species, among kingdoms and across eukaryotes. Motifs of note include a serine-acidic peptide (DSD*) as well as several lysine enriched motifs found in nearly all eukaryotic genomes examined.

CONCLUSION

We have successfully generated a high confidence representation of eukaryotic motifs anchored at the C-terminus. A high incidence of true-positives in our results suggests that several previously unidentified tripeptide patterns are strong candidates for representing novel peptide motifs of a widely employed nature in the C-terminal biology of eukaryotes. Our application of comparative genomics, statistical over-representation and the adjustment for protein family homology has generated several hypotheses concerning the C-terminal topology as it pertains to sorting and potential protein interaction signals. This approach to background reduction could be expanded for application to protein motif prediction in the protein interior. A parallel N-terminal analysis is presented as supplementary data.

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ee4f/1929074/763ba268cb0f/1471-2164-8-191-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ee4f/1929074/a39113e11a68/1471-2164-8-191-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ee4f/1929074/6283cc91bca3/1471-2164-8-191-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ee4f/1929074/a23b8c862558/1471-2164-8-191-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ee4f/1929074/763ba268cb0f/1471-2164-8-191-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ee4f/1929074/a39113e11a68/1471-2164-8-191-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ee4f/1929074/6283cc91bca3/1471-2164-8-191-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ee4f/1929074/a23b8c862558/1471-2164-8-191-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ee4f/1929074/763ba268cb0f/1471-2164-8-191-4.jpg
摘要

背景

蛋白质的羧基末端是多种生物学重要功能的常见活性位点,从翻译后修饰到蛋白质靶向定位。已经鉴定出了几种参与蛋白质分选作用且其功能依赖于与C末端的接近程度的短肽基序。由于已鉴定出的此类基序数量有限,因此存在进行全基因组统计分析和比较基因组学以揭示以C末端依赖性方式发挥作用的新型肽特征的可能性。我们应用了一种新颖的方法来预测C末端锚定的肽基序,该方法涉及一个简单的z统计量和几种提高信噪比的技术。

结果

我们研究了7个真核生物蛋白质组中位置特异性C末端三肽的统计学过度代表性。应用序列随机化模型和简单序列屏蔽成功降低了背景噪声。同样,由于大型蛋白质家族成员之间的C末端同源性可能会以不相关且混淆的方式人为地夸大三肽计数,因此在分析之前进行了基因家族聚类,以评估跨蛋白质家族而非所有蛋白质的三肽过度代表性。最后,使用比较基因组学来鉴定在多个物种中显著出现的三肽。据我们所知,这种方法能够预测文献中存在的所有C末端锚定的靶向基序。这些包括PTS1过氧化物酶体靶向信号(SKL*)、内质网滞留信号(K/HDEL*)、膜结合蛋白的内质网回收信号(KKxx*)、异戊二烯化信号(CC*)和CaaX盒异戊二烯化基序。除了这些已知基序的高度统计学过度代表性外,在物种之间、界之间和真核生物之间还存在一系列具有高生物学功能倾向的显著三肽。值得注意的基序包括一个丝氨酸-酸性肽(DSD*)以及在几乎所有检测的真核生物基因组中发现的几个富含赖氨酸的基序。

结论

我们成功生成了一个高可信度的真核生物C末端锚定基序的表示。我们结果中的高真阳性发生率表明,几种先前未鉴定出的三肽模式是代表真核生物C末端生物学中广泛存在的新型肽基序的有力候选者。我们对比较基因组学、统计学过度代表性和蛋白质家族同源性调整的应用产生了几个关于C末端拓扑结构的假设,这些假设与分选和潜在的蛋白质相互作用信号有关。这种减少背景的方法可以扩展应用于蛋白质内部的蛋白质基序预测。作为补充数据给出了一个平行的N末端分析。

相似文献

1
C-terminal motif prediction in eukaryotic proteomes using comparative genomics and statistical over-representation across protein families.利用比较基因组学和蛋白质家族间的统计学过度代表性预测真核生物蛋白质组中的C端基序
BMC Genomics. 2007 Jun 26;8:191. doi: 10.1186/1471-2164-8-191.
2
N-terminal N-myristoylation of proteins: prediction of substrate proteins from amino acid sequence.蛋白质的N端N-肉豆蔻酰化:从氨基酸序列预测底物蛋白
J Mol Biol. 2002 Apr 5;317(4):541-57. doi: 10.1006/jmbi.2002.5426.
3
Non-canonical peroxisome targeting signals: identification of novel PTS1 tripeptides and characterization of enhancer elements by computational permutation analysis.非规范过氧化物酶体靶向信号:通过计算排列分析鉴定新型 PTS1 三肽和增强子元件。
BMC Plant Biol. 2012 Aug 11;12:142. doi: 10.1186/1471-2229-12-142.
4
N-terminal N-myristoylation of proteins: refinement of the sequence motif and its taxon-specific differences.蛋白质的N端N-肉豆蔻酰化:序列基序的优化及其分类群特异性差异
J Mol Biol. 2002 Apr 5;317(4):523-40. doi: 10.1006/jmbi.2002.5425.
5
Functional neighbors: inferring relationships between nonhomologous protein families using family-specific packing motifs.功能邻域:利用家族特异性包装基序推断非同源蛋白质家族之间的关系。
IEEE Trans Inf Technol Biomed. 2010 Sep;14(5):1137-43. doi: 10.1109/TITB.2010.2053550. Epub 2010 Jun 21.
6
Nonrandom tripeptide sequence distributions at protein carboxyl termini.蛋白质羧基末端的非随机三肽序列分布。
Genome Res. 2003 Apr;13(4):617-23. doi: 10.1101/gr.667603.
7
Distance-based identification of structure motifs in proteins using constrained frequent subgraph mining.使用受限频繁子图挖掘基于距离的蛋白质结构基序识别
Comput Syst Bioinformatics Conf. 2006:227-38.
8
Identification of proteolytic products and natural protein N-termini by Terminal Amine Isotopic Labeling of Substrates (TAILS).通过底物末端胺同位素标记法(TAILS)鉴定蛋白水解产物和天然蛋白质N端
Methods Mol Biol. 2011;753:273-87. doi: 10.1007/978-1-61779-148-2_18.
9
Comparative genomics on Wnt16 orthologs.Wnt16直系同源基因的比较基因组学。
Oncol Rep. 2005 Apr;13(4):771-5.
10
Fast model-based protein homology detection without alignment.基于快速模型的无需比对的蛋白质同源性检测。
Bioinformatics. 2007 Jul 15;23(14):1728-36. doi: 10.1093/bioinformatics/btm247. Epub 2007 May 8.

引用本文的文献

1
The carboxy-terminus, a key regulator of protein function.羧基末端,蛋白质功能的关键调节剂。
Crit Rev Biochem Mol Biol. 2019 Apr;54(2):85-102. doi: 10.1080/10409238.2019.1586828. Epub 2019 May 20.
2
Isolation and Characterization of Reticuline N-Methyltransferase Involved in Biosynthesis of the Aporphine Alkaloid Magnoflorine in Opium Poppy.参与罂粟中阿朴啡生物碱木兰碱生物合成的网叶番荔枝碱N-甲基转移酶的分离与鉴定
J Biol Chem. 2016 Nov 4;291(45):23416-23427. doi: 10.1074/jbc.M116.750893. Epub 2016 Sep 15.
3
The Functional Human C-Terminome.

本文引用的文献

1
Functional grouping based on signatures in protein termini.基于蛋白质末端特征的功能分组。
Proteins. 2006 Jun 1;63(4):996-1004. doi: 10.1002/prot.20903.
2
Progress in protein structural class prediction and its impact to bioinformatics and proteomics.蛋白质结构类别预测的进展及其对生物信息学和蛋白质组学的影响。
Curr Protein Pept Sci. 2005 Oct;6(5):423-36. doi: 10.2174/138920305774329368.
3
Refinement and prediction of protein prenylation motifs.蛋白质异戊二烯化基序的优化与预测
功能性人类C端蛋白质组
PLoS One. 2016 Apr 6;11(4):e0152731. doi: 10.1371/journal.pone.0152731. eCollection 2016.
4
A frequency-based linguistic approach to protein decoding and design: Simple concepts, diverse applications, and the SCS Package.一种基于频率的蛋白质解码与设计语言方法:简单概念、多样应用及SCS软件包
Comput Struct Biotechnol J. 2013 Mar 29;5:e201302010. doi: 10.5936/csbj.201302010. eCollection 2013.
5
DLocalMotif: a discriminative approach for discovering local motifs in protein sequences.DLocalMotif:一种用于发现蛋白质序列中局部基序的判别方法。
Bioinformatics. 2013 Jan 1;29(1):39-46. doi: 10.1093/bioinformatics/bts654. Epub 2012 Nov 9.
6
Tandem termination signal in plant mRNAs.植物 mRNA 中的串联终止信号。
Gene. 2011 Jul 15;481(1):1-6. doi: 10.1016/j.gene.2011.04.002. Epub 2011 Apr 22.
Genome Biol. 2005;6(6):R55. doi: 10.1186/gb-2005-6-6-r55. Epub 2005 May 27.
4
Recent developments in structural proteomics for protein structure determination.用于蛋白质结构测定的结构蛋白质组学的最新进展。
Proteomics. 2005 May;5(8):2056-68. doi: 10.1002/pmic.200401104.
5
Genome wide analysis of Arabidopsis core promoters.拟南芥核心启动子的全基因组分析。
BMC Genomics. 2005 Feb 25;6:25. doi: 10.1186/1471-2164-6-25.
6
Availability of short amino acid sequences in proteins.蛋白质中短氨基酸序列的可用性。
Protein Sci. 2005 Mar;14(3):617-25. doi: 10.1110/ps.041092605. Epub 2005 Feb 2.
7
Assessing computational tools for the discovery of transcription factor binding sites.评估用于发现转录因子结合位点的计算工具。
Nat Biotechnol. 2005 Jan;23(1):137-44. doi: 10.1038/nbt1053.
8
p53 linear diffusion along DNA requires its C terminus.p53沿DNA的线性扩散需要其C末端。
Mol Cell. 2004 Nov 5;16(3):413-24. doi: 10.1016/j.molcel.2004.09.032.
9
Detecting DNA regulatory motifs by incorporating positional trends in information content.通过纳入信息内容中的位置趋势来检测DNA调控基序。
Genome Biol. 2004;5(7):R50. doi: 10.1186/gb-2004-5-7-r50. Epub 2004 Jun 24.
10
A C-terminal determinant of GluR6 kainate receptor trafficking.红藻氨酸受体GluR6转运的C末端决定因素。
J Neurosci. 2004 Jan 21;24(3):679-91. doi: 10.1523/JNEUROSCI.4985-03.2004.