Verbiest Max A, Delucchi Matteo, Bilgin Sonay Tugce, Anisimova Maria
Institute of Applied Simulation, School of Life Sciences and Facility Management, Zürich University of Applied Sciences, Wädenswil, Switzerland.
Institute for Computational Science, Faculty of Science, University of Zurich, Zurich, Switzerland.
Front Bioinform. 2021 Jun 8;1:685844. doi: 10.3389/fbinf.2021.685844. eCollection 2021.
Short tandem repeats (STRs) are abundant in genomic sequences and are known for comparatively high mutation rates; STRs therefore are thought to be a potent source of genetic diversity. In protein-coding sequences STRs primarily encode disorder-promoting amino acids and are often located in intrinsically disordered regions (IDRs). STRs are frequently studied in the scope of microsatellite instability (MSI) in cancer, with little focus on the connection between protein STRs and IDRs. We believe, however, that this relationship should be explicitly included when ascertaining STR functionality in cancer. Here we explore this notion using all canonical human proteins from SwissProt, wherein we detected 3,699 STRs. Over 80% of these consisted completely of disorder promoting amino acids. 62.1% of amino acids in STR sequences were predicted to also be in an IDR, compared to 14.2% for non-repeat sequences. Over-representation analysis showed STR-containing proteins to be primarily located in the nucleus where they perform protein- and nucleotide-binding functions and regulate gene expression. They were also enriched in cancer-related signaling pathways. Furthermore, we found enrichments of STR-containing proteins among those correlated with patient survival for cancers derived from eight different anatomical sites. Intriguingly, several of these cancer types are not known to have a MSI-high (MSI-H) phenotype, suggesting that protein STRs play a role in cancer pathology in non MSI-H settings. Their intrinsic link with IDRs could therefore be an attractive topic of future research to further explore the role of STRs and IDRs in cancer. We speculate that our observations may be linked to the known dosage-sensitivity of disordered proteins, which could hint at a concentration-dependent gain-of-function mechanism in cancer for proteins containing STRs and IDRs.
短串联重复序列(STRs)在基因组序列中大量存在,以相对较高的突变率而闻名;因此,STRs被认为是遗传多样性的一个重要来源。在蛋白质编码序列中,STRs主要编码促进无序的氨基酸,并且经常位于内在无序区域(IDRs)。STRs经常在癌症的微卫星不稳定性(MSI)范围内进行研究,很少关注蛋白质STRs与IDRs之间的联系。然而,我们认为,在确定癌症中STR的功能时,应该明确考虑这种关系。在这里,我们使用来自SwissProt的所有典型人类蛋白质来探讨这一概念,在其中我们检测到了3699个STRs。其中超过80%完全由促进无序的氨基酸组成。STR序列中62.1%的氨基酸预计也处于IDR中,而非重复序列的这一比例为14.2%。过表达分析表明,含有STR的蛋白质主要位于细胞核中,在那里它们执行蛋白质和核苷酸结合功能并调节基因表达。它们也富集于癌症相关的信号通路中。此外,我们发现含有STR的蛋白质在源自八个不同解剖部位的癌症患者生存相关的蛋白质中也有富集。有趣的是,这些癌症类型中有几种并不具有微卫星高度不稳定(MSI-H)表型,这表明蛋白质STRs在非MSI-H情况下的癌症病理学中发挥作用。因此,它们与IDRs的内在联系可能是未来研究的一个有吸引力的课题,以进一步探索STRs和IDRs在癌症中的作用。我们推测,我们的观察结果可能与已知的无序蛋白质的剂量敏感性有关,这可能暗示了含有STRs和IDRs的蛋白质在癌症中存在浓度依赖性的功能获得机制。