蛋白质序列的可重复性。

Repeatability in protein sequences.

机构信息

Department of Computer Science, Faculty of Exact Sciences, University of Bejaia, 06000 Bejaia, Algeria.

Faculty of Biology, Johannes Gutenberg University of Mainz, 55128 Mainz, Germany.

出版信息

J Struct Biol. 2019 Nov 1;208(2):86-91. doi: 10.1016/j.jsb.2019.08.003. Epub 2019 Aug 10.

DOI:10.1016/j.jsb.2019.08.003

PMID:31408700

Abstract

Low complexity regions (LCRs) in protein sequences have special properties that are very different from those of globular proteins. The rules that define secondary structure elements do not apply when the distribution of amino acids becomes biased. While there is a tendency towards structural disorder in LCRs, various examples, and particularly homorepeats of single amino acids, suggest that very short repeats could adopt structures very difficult to predict. These structures are possibly variable and dependant on the context of intra- or inter-molecular interactions. In general, short repeats in LCRs can induce structure. This could explain the observation that very short (non-perfect) repeats are widespread and many define regions with a function in protein interactions. For these reasons, we have developed an algorithm to quickly analyze local repeatability along protein sequences, that is, how close a protein fragment is from a perfect repeat. Using this algorithm we identified that the proteins of the yeast Saccharomyces cerevisiae are depleted in short repeats (approximate or not) of odd-length, while the human proteins are not, that the fish Danio rerio has many proteins with repeats of length two and that the plant Arabidopsis thaliana has an unusually large amount of repeats of length seven. Our method (REpeatability Scanner, RES, accessible at http://cbdm-01.zdv.uni-mainz.de/~munoz/res/) allows to find regions with approximate short repeats in protein sequences, and helps to characterize the variable use of LCRs and compositional bias in different organisms.

摘要

蛋白质序列中的低复杂度区域（LCRs）具有特殊的性质，与球状蛋白非常不同。当氨基酸的分布出现偏向时，定义二级结构元件的规则不再适用。虽然 LCRs 中存在结构无序的趋势，但各种例子，特别是单个氨基酸的同源重复，表明非常短的重复可能会采用非常难以预测的结构。这些结构可能是可变的，并取决于分子内或分子间相互作用的上下文。一般来说，LCRs 中的短重复可以诱导结构。这可以解释这样一个观察结果，即非常短（非完美）的重复广泛存在，并且许多重复定义了蛋白质相互作用中的功能区域。出于这些原因，我们开发了一种算法来快速分析蛋白质序列中的局部重复性，即蛋白质片段与完美重复的接近程度。使用此算法，我们确定酵母 Saccharomyces cerevisiae 的蛋白质中缺乏奇数长度的近似或非完美短重复，而人类蛋白质则没有，鱼类 Danio rerio 有许多长度为 2 的重复蛋白，而植物 Arabidopsis thaliana 则有异常大量的长度为 7 的重复。我们的方法（REpeatability Scanner，RES，可在 http://cbdm-01.zdv.uni-mainz.de/~munoz/res/ 上获得）允许在蛋白质序列中找到具有近似短重复的区域，并有助于描述不同生物体中 LCRs 的可变使用和组成偏向。

相似文献

Repeatability in protein sequences.蛋白质序列的可重复性。

J Struct Biol. 2019 Nov 1;208(2):86-91. doi: 10.1016/j.jsb.2019.08.003. Epub 2019 Aug 10.

Assessing the low complexity of protein sequences via the low complexity triangle.通过低复杂度三角形评估蛋白质序列的低复杂度。

PLoS One. 2020 Dec 30;15(12):e0239154. doi: 10.1371/journal.pone.0239154. eCollection 2020.

REP2: A Web Server to Detect Common Tandem Repeats in Protein Sequences.REP2：一个用于检测蛋白质序列中常见串联重复的网络服务器。

J Mol Biol. 2021 May 28;433(11):166895. doi: 10.1016/j.jmb.2021.166895. Epub 2021 Feb 24.

dAPE: a web server to detect homorepeats and follow their evolution.dAPE：一个用于检测同聚物重复序列并追踪其进化的网络服务器。

Bioinformatics. 2017 Apr 15;33(8):1221-1223. doi: 10.1093/bioinformatics/btw790.

A Novel algorithm for identifying low-complexity regions in a protein sequence.一种用于识别蛋白质序列中低复杂度区域的新型算法。

Bioinformatics. 2006 Dec 15;22(24):2980-7. doi: 10.1093/bioinformatics/btl495. Epub 2006 Oct 2.

Detecting cryptically simple protein sequences using the SIMPLE algorithm.使用SIMPLE算法检测隐蔽的简单蛋白质序列。

Bioinformatics. 2002 May;18(5):672-8. doi: 10.1093/bioinformatics/18.5.672.

Rapid automatic detection and alignment of repeats in protein sequences.蛋白质序列中重复序列的快速自动检测与比对

Proteins. 2000 Nov 1;41(2):224-37. doi: 10.1002/1097-0134(20001101)41:2<224::aid-prot70>3.0.co;2-z.

Disentangling the complexity of low complexity proteins.解析低复杂度蛋白质的复杂性。

Brief Bioinform. 2020 Mar 23;21(2):458-472. doi: 10.1093/bib/bbz007.

ProtRepeatsDB: a database of amino acid repeats in genomes.ProtRepeatsDB：基因组中氨基酸重复序列数据库。

BMC Bioinformatics. 2006 Jul 7;7:336. doi: 10.1186/1471-2105-7-336.

Tracking repeats using significance and transitivity.利用显著性和传递性追踪重复序列。

Bioinformatics. 2004 Aug 4;20 Suppl 1:i311-7. doi: 10.1093/bioinformatics/bth911.

引用本文的文献

The nucleotide landscape of polyXY regions.多XY区域的核苷酸图谱。

Comput Struct Biotechnol J. 2023 Oct 31;21:5408-5412. doi: 10.1016/j.csbj.2023.10.054. eCollection 2023.

Bioinformatics tools for the sequence complexity estimates.用于序列复杂性估计的生物信息学工具。

Biophys Rev. 2023 Sep 15;15(5):1367-1378. doi: 10.1007/s12551-023-01140-y. eCollection 2023 Oct.

Evolutionary Study of Protein Short Tandem Repeats in Protein Families.蛋白质家族中蛋白质短串联重复的进化研究。

Biomolecules. 2023 Jul 13;13(7):1116. doi: 10.3390/biom13071116.

Interaction modules that impart specificity to disordered protein.赋予无序蛋白特异性的相互作用模块。

Trends Biochem Sci. 2023 May;48(5):477-490. doi: 10.1016/j.tibs.2023.01.004. Epub 2023 Feb 6.

Genome-wide survey of D/E repeats in human proteins uncovers their instability and aids in identifying their role in the chromatin regulator ATAD2.对人类蛋白质中D/E重复序列的全基因组调查揭示了它们的不稳定性，并有助于确定它们在染色质调节因子ATAD2中的作用。

iScience. 2022 Oct 31;25(12):105464. doi: 10.1016/j.isci.2022.105464. eCollection 2022 Dec 22.

Functional Tuning of Intrinsically Disordered Regions in Human Proteins by Composition Bias.通过组成偏见对人类蛋白质中的无规卷曲区域进行功能调节。

Biomolecules. 2022 Oct 15;12(10):1486. doi: 10.3390/biom12101486.

Search for Highly Divergent Tandem Repeats in Amino Acid Sequences.搜索氨基酸序列中的高度变异串联重复。

Int J Mol Sci. 2021 Jul 1;22(13):7096. doi: 10.3390/ijms22137096.

Assessing the low complexity of protein sequences via the low complexity triangle.通过低复杂度三角形评估蛋白质序列的低复杂度。

PLoS One. 2020 Dec 30;15(12):e0239154. doi: 10.1371/journal.pone.0239154. eCollection 2020.

Evolutionary Study of Disorder in Protein Sequences.蛋白质序列中的无序性进化研究。

Biomolecules. 2020 Oct 6;10(10):1413. doi: 10.3390/biom10101413.

Disordered Residues and Patterns in the Protein Data Bank.蛋白质数据库中的无序残基和模式。

Molecules. 2020 Mar 27;25(7):1522. doi: 10.3390/molecules25071522.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

蛋白质序列的可重复性。

Repeatability in protein sequences.

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献