Suppr超能文献

在全球范围内,不相关的蛋白质序列看起来是随机的。

Globally, unrelated protein sequences appear random.

机构信息

Department of Biochemistry and Molecular Genetics, University of Virginia, Jordan Hall Box 800733, Charlottesville, VA 22908, USA.

出版信息

Bioinformatics. 2010 Feb 1;26(3):310-8. doi: 10.1093/bioinformatics/btp660. Epub 2009 Nov 30.

Abstract

MOTIVATION

To test whether protein folding constraints and secondary structure sequence preferences significantly reduce the space of amino acid words in proteins, we compared the frequencies of four- and five-amino acid word clumps (independent words) in proteins to the frequencies predicted by four random sequence models.

RESULTS

While the human proteome has many overrepresented word clumps, these words come from large protein families with biased compositions (e.g. Zn-fingers). In contrast, in a non-redundant sample of Pfam-AB, only 1% of four-amino acid word clumps (4.7% of 5mer words) are 2-fold overrepresented compared with our simplest random model [MC(0)], and 0.1% (4mers) to 0.5% (5mers) are 2-fold overrepresented compared with a window-shuffled random model. Using a false discovery rate q-value analysis, the number of exceptional four- or five-letter words in real proteins is similar to the number found when comparing words from one random model to another. Consensus overrepresented words are not enriched in conserved regions of proteins, but four-letter words are enriched 1.18- to 1.56-fold in alpha-helical secondary structures (but not beta-strands). Five-residue consensus exceptional words are enriched for alpha-helix 1.43- to 1.61-fold. Protein word preferences in regular secondary structure do not appear to significantly restrict the use of sequence words in unrelated proteins, although the consensus exceptional words have a secondary structure bias for alpha-helix. Globally, words in protein sequences appear to be under very few constraints; for the most part, they appear to be random.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

为了测试蛋白质折叠约束和二级结构序列偏好是否显著减少蛋白质中氨基酸单词的空间,我们将蛋白质中四肽和五肽单词簇(独立单词)的频率与四个随机序列模型的预测频率进行了比较。

结果

尽管人类蛋白质组中有许多过度表示的单词簇,但这些单词来自于组成偏向性较大的大型蛋白质家族(例如 Zn 指)。相比之下,在 Pfam-AB 的非冗余样本中,与我们最简单的随机模型 [MC(0)] 相比,只有 1%的四肽单词簇(4.7%的 5mer 单词)是两倍过度表示,而 0.1%(4mers)至 0.5%(5mers)是两倍过度表示与窗口打乱的随机模型相比。使用错误发现率 q 值分析,真实蛋白质中异常的四字母或五字母单词的数量与将一个随机模型中的单词与另一个随机模型中的单词进行比较时发现的数量相似。共识过度表示的单词在蛋白质的保守区域没有富集,但四字母单词在α-螺旋二级结构中富集 1.18-1.56 倍(但不在β-折叠中)。四残基共识异常单词在α-螺旋中富集 1.43-1.61 倍。规则二级结构中的蛋白质单词偏好似乎并没有显著限制在不相关蛋白质中使用序列单词,尽管共识异常单词对α-螺旋有二级结构偏向。总体而言,蛋白质序列中的单词似乎受到很少的限制;在很大程度上,它们似乎是随机的。

补充信息

补充数据可在 Bioinformatics 在线获取。

相似文献

1
Globally, unrelated protein sequences appear random.在全球范围内,不相关的蛋白质序列看起来是随机的。
Bioinformatics. 2010 Feb 1;26(3):310-8. doi: 10.1093/bioinformatics/btp660. Epub 2009 Nov 30.
5
The size distribution of protein families within different types of folds.不同折叠类型中蛋白质家族的大小分布。
Biochem Biophys Res Commun. 2011 Mar 11;406(2):218-22. doi: 10.1016/j.bbrc.2011.02.020. Epub 2011 Feb 15.
8
Cotranslational protein folding--fact or fiction?共翻译蛋白质折叠——事实还是虚构?
Bioinformatics. 2007 Jul 1;23(13):i142-8. doi: 10.1093/bioinformatics/btm175.
10
Recognizing the fold of a protein structure.识别蛋白质结构的折叠。
Bioinformatics. 2003 Sep 22;19(14):1748-59. doi: 10.1093/bioinformatics/btg240.

本文引用的文献

3
Numerical solutions for patterns statistics on Markov chains.马尔可夫链模式统计的数值解。
Stat Appl Genet Mol Biol. 2006;5:Article26. doi: 10.2202/1544-6115.1219. Epub 2006 Oct 17.
6
The limits of protein sequence comparison?蛋白质序列比较的局限性?
Curr Opin Struct Biol. 2005 Jun;15(3):254-60. doi: 10.1016/j.sbi.2005.05.005.
8
Protein structure prediction using Rosetta.使用Rosetta进行蛋白质结构预测。
Methods Enzymol. 2004;383:66-93. doi: 10.1016/S0076-6879(04)83004-0.
9
Protein secondary structure: entropy, correlations and prediction.蛋白质二级结构:熵、相关性与预测
Bioinformatics. 2004 Jul 10;20(10):1603-11. doi: 10.1093/bioinformatics/bth132. Epub 2004 Feb 26.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验