Gamaleya Research Centre of Epidemiology and Microbiology, 123098, Moscow, Russia; Institute of Protein Research, Russian Academy of Sciences, 142290, Pushchino, Moscow Region, Russia.
Institute of Protein Research, Russian Academy of Sciences, 142290, Pushchino, Moscow Region, Russia.
Biosystems. 2024 Dec;246:105357. doi: 10.1016/j.biosystems.2024.105357. Epub 2024 Oct 21.
It is well known that there is a codon usage bias in genomes, that is, some codons are observed more often than others. Codons implicated in the homo-repeats regions in human proteins are no exception. In this work, we analyzed the codon usage bias for all amino acid residues in homo-repeats larger than 4 in 3753 human proteins from 20447 protein sequences from the canonically reviewed human proteome. We have discovered that almost all homo-repeats in the human proteome, most of which encode Ala, Glu, Gly, Leu, Pro, and Ser (∼80% of all homo-repeats), have a codon usage bias, i.e. are mainly encoded by one codon. Moreover, there is a strong shift in homo-repeats in favor of the content of GC rich codons. Homo-repeats with Ala, Glu, Gly, Leu, Pro, and Ser predominate in the PDB, which has both ordered and disordered status. Examining the distribution of splicing sites, we found that about 15% of homo-repeats either contain or are located within 10 nucleotides of the splicing site, and Glu and Leu predominate in these homo-repeats. Our data is important for future study of the functions of homo-repeats, protein-protein interactions, and evolutionary fitness.
众所周知,基因组中存在密码子使用偏好性,即某些密码子比其他密码子更频繁地出现。在人类蛋白质的同源重复区域中涉及的密码子也不例外。在这项工作中,我们分析了来自 20447 个蛋白质序列的 3753 个人类蛋白质中同源重复长度大于 4 的所有氨基酸残基的密码子使用偏好性。我们发现,人类蛋白质组中的几乎所有同源重复序列(约占所有同源重复序列的 80%)都具有密码子使用偏好性,即主要由一个密码子编码。此外,同源重复序列中存在强烈的 GC 丰富密码子含量偏好性。富含 Ala、Glu、Gly、Leu、Pro 和 Ser 的同源重复序列在 PDB 中占主导地位,PDB 既有有序状态,也有无序状态。检查剪接位点的分布,我们发现约 15%的同源重复序列要么包含剪接位点,要么位于剪接位点的 10 个核苷酸内,并且这些同源重复序列中 Glu 和 Leu 占主导地位。我们的数据对于未来研究同源重复序列的功能、蛋白质-蛋白质相互作用和进化适应性非常重要。