Galzitskaya Oxana V, Novikov Georgii S, Dovidchenko Nikita V, Lobanov Mikhail Yu
* Institute of Protein Research, Russian Academy of Sciences, Institutskaya Str., 4, Pushchino, Moscow Region 142290, Russia.
† St. Petersburg Academic University, Nanotechnology Research and Education Centre of the Russian Academy of Sciences, St. Petersburg, Khlopina Str., 8/3, 194021, Russia.
J Bioinform Comput Biol. 2019 Feb;17(1):1950010. doi: 10.1142/S0219720019500100.
We have analyzed codon usage for poly-Q stretches of different lengths for the human proteome. First, we have obtained that all long poly-Q stretches in Protein Data Bank (PDB) belong to the disordered regions. Second, we have found the bias for codon usage for glutamine homo-repeats in the human proteome. In the cases when the same codon is used for poly-Q stretches only CAG triplets are found. Similar results are obtained for human proteins with glutamine homo-repeats associated with diseases. Moreover, for proteins associated with diseases (from the HraDis database), the fraction of proteins for which the same codon is used for glutamine homo-repeats is less (22%) than for proteins from the human proteome (26%). We have demonstrated for poly-Q stretches in the human proteome that in some cases (28) the splicing sites correspond to the homo-repeats and in 11 cases, these sites appear at the -terminal part of the homo-repeats with statistical significance 10 .
我们分析了人类蛋白质组中不同长度的多聚谷氨酰胺序列的密码子使用情况。首先,我们发现蛋白质数据库(PDB)中所有长的多聚谷氨酰胺序列都属于无序区域。其次,我们发现了人类蛋白质组中谷氨酰胺同聚物重复序列的密码子使用偏好。在多聚谷氨酰胺序列使用相同密码子的情况下,只发现了CAG三联体。对于与疾病相关的含有谷氨酰胺同聚物重复序列的人类蛋白质,也得到了类似的结果。此外,对于与疾病相关的蛋白质(来自HraDis数据库),谷氨酰胺同聚物重复序列使用相同密码子的蛋白质比例(22%)低于人类蛋白质组中的蛋白质比例(26%)。我们已经证明,在人类蛋白质组的多聚谷氨酰胺序列中,在某些情况下(28个),剪接位点与同聚物重复序列相对应,在11个案例中,这些位点出现在同聚物重复序列的N末端部分,具有统计学意义(P<0.01)。