Institute of Biophysics, Academy of Sciences of the Czech Republic v.v.i., Královopolská 135, 612 65 Brno, Czech Republic.
Department of Biology and Ecology/Institute of Environmental Technologies, Faculty of Science, University of Ostrava, 710 00 Ostrava, Czech Republic.
Molecules. 2018 Sep 13;23(9):2341. doi: 10.3390/molecules23092341.
The importance of local DNA structures in the regulation of basic cellular processes is an emerging field of research. Amongst local non-B DNA structures, G-quadruplexes are perhaps the most well-characterized to date, and their presence has been demonstrated in many genomes, including that of humans. G-quadruplexes are selectively bound by many regulatory proteins. In this paper, we have analyzed the amino acid composition of all seventy-seven described G-quadruplex binding proteins of Homo sapiens. Our comparison with amino acid frequencies in all human proteins and specific protein subsets (e.g., all nucleic acid binding) revealed unique features of quadruplex binding proteins, with prominent enrichment for glycine (G) and arginine (R). Cluster analysis with bootstrap resampling shows similarities and differences in amino acid composition of particular quadruplex binding proteins. Interestingly, we found that all characterized G-quadruplex binding proteins share a 20 amino acid long motif/domain (RGRGR GRGGG SGGSG GRGRG) which is similar to the previously described RG-rich domain (RRGDG RRRGG GGRGQ GGRGR GGGFKG) of the FRM1 G-quadruplex binding protein. Based on this protein fingerprint, we have predicted a new set of potential G-quadruplex binding proteins sharing this interesting domain rich in glycine and arginine residues.
局部 DNA 结构在调控基本细胞过程中的重要性是一个新兴的研究领域。在局部非 B-DNA 结构中,G-四链体是迄今为止研究最为广泛的结构之一,其存在已在许多基因组中得到证实,包括人类基因组。G-四链体被许多调节蛋白选择性结合。在本文中,我们分析了人类 77 种描述的 G-四链体结合蛋白的氨基酸组成。我们将其与所有人类蛋白和特定蛋白亚群(例如,所有核酸结合蛋白)中的氨基酸频率进行比较,揭示了四链体结合蛋白的独特特征,甘氨酸(G)和精氨酸(R)明显富集。带有 bootstrap 重采样的聚类分析显示了特定四链体结合蛋白的氨基酸组成的相似性和差异。有趣的是,我们发现所有已鉴定的 G-四链体结合蛋白都共享一个 20 个氨基酸长的基序/结构域(RGRGRGRGGG SGGSGGRGRG),与先前描述的 FRM1 G-四链体结合蛋白的 RG 富含结构域(RRGDG RRRGG GGRGQ GGRGR GGGFKG)相似。基于该蛋白指纹,我们预测了一组新的潜在 G-四链体结合蛋白,它们共享富含甘氨酸和精氨酸残基的这一有趣结构域。