Van Bibber Nathan W, Haerle Cornelia, Khalife Roy, Dayhoff Guy W, Uversky Vladimir N
Department of Molecular Medicine Morsani College of Medicine, University of South Florida, 12901 Bruce B. Downs Boulevard, Tampa, Florida 33612, United States.
Department of Chemistry, College of Art and Sciences, University of South Florida, Tampa, Florida 33620, United States.
J Phys Chem B. 2020 Sep 17;124(37):8050-8070. doi: 10.1021/acs.jpcb.0c07676. Epub 2020 Sep 3.
Segmental duplications (i.e., highly homologous DNA fragments greater than 1 kb in length that are present within a genome at more than one site) are typically found in genome regions that are prone to rearrangements. A noticeable fraction of the human genome (∼5%) includes segmental duplications (or duplicons) that are assumed to play a number of vital roles in human evolution, human-specific adaptation, and genomic instability. Despite their importance for crucial events such as synaptogenesis, neuronal migration, and neocortical expansion, these segmental duplications continue to be rather poorly characterized. Of particular interest are the core duplicon gene (CDG) families, which are replicates sharing common "core" DNA among the randomly attached pieces and which expand along single chromosomes and might harbor newly acquired protein domains. Another important feature of proteins encoded by CDG families is their multifunctionality. Although it seems that these proteins might possess many characteristic features of intrinsically disordered proteins, to the best of our knowledge, a systematic investigation of the intrinsic disorder predisposition of the proteins encoded by core duplicon gene families has not been conducted yet. To fill this gap and to determine the degree to which these proteins might be affected by intrinsic disorder, we analyzed a set of human proteins encoded by the members of 10 core duplicon gene families, such as , , , , , , , , , and . Our analysis revealed that the vast majority of these proteins are highly disordered, with their disordered regions often being utilized as means for the protein-protein interactions and/or targeted for numerous posttranslational modifications of different nature.
片段重复(即长度大于1 kb的高度同源DNA片段,在基因组中多个位点存在)通常存在于易于发生重排的基因组区域。人类基因组中相当一部分(约5%)包含片段重复(或重复子),据推测它们在人类进化、人类特异性适应和基因组不稳定性中发挥着许多重要作用。尽管它们对于诸如突触形成、神经元迁移和新皮质扩张等关键事件很重要,但这些片段重复的特征仍然相当不清楚。特别令人感兴趣的是核心重复子基因(CDG)家族,它们是在随机连接的片段之间共享共同“核心”DNA的复制体,沿着单条染色体扩展,可能含有新获得的蛋白质结构域。CDG家族编码的蛋白质的另一个重要特征是它们的多功能性。虽然这些蛋白质似乎可能具有许多内在无序蛋白质的特征,但据我们所知,尚未对核心重复子基因家族编码的蛋白质的内在无序倾向进行系统研究。为了填补这一空白并确定这些蛋白质可能受内在无序影响的程度,我们分析了由10个核心重复子基因家族成员编码的一组人类蛋白质,如 、 、 、 、 、 、 、 、 和 。我们的分析表明,这些蛋白质中的绝大多数高度无序,其无序区域经常被用作蛋白质-蛋白质相互作用的手段和/或针对多种不同性质的翻译后修饰。