Suppr超能文献

由于存在长单核苷酸简单重复序列,人类基因的高突变性。

Hypermutability of genes in Homo sapiens due to the hosting of long mono-SSR.

作者信息

Loire Etienne, Praz Françoise, Higuet Dominique, Netter Pierre, Achaz Guillaume

机构信息

Université Pierre et Marie Curie-Paris 6, Unité Mixte de recherche 7592, Institut Jacques Monod, Paris, France.

出版信息

Mol Biol Evol. 2009 Jan;26(1):111-21. doi: 10.1093/molbev/msn230. Epub 2008 Oct 8.

Abstract

Simple sequence repeats (SSRs) are very common short repeats in eukaryotic genomes. "Long" SSRs are considered "hypermutable" sequences because they exhibit a high rate of expansion and contraction. Because they are potentially deleterious, long SSRs tend to be uncommon in coding sequences. However, several genes contain long SSRs in their exonic sequences. Here, we identify 1,291 human genes that host a mononucleotide SSR long enough to be prone to expansion or contraction, being called hypermutable hereafter. On the basis of Gene Ontology annotations, we show that only a restricted number of functions are overrepresented among those hypermutable genes including cell cycle and maintenance of DNA integrity. Using a probabilistic model, we show that genes involved in these functions are expected to host long SSRs because they tend to be long and/or are biased in nucleotide composition. Finally, we show that for almost all functions we observe fewer hypermutable sequences than expected under a neutral model. There are however interesting exceptions, for example, genes involved in protein and RNA transport, as well as meiosis and mismatch repair functions that have as many hypermutable genes as expected under neutrality. Conversely, there are functions (e.g., collagen-related genes) where hypermutable genes are more often avoided than in other functions. Our results show that, even though several functions harbor unusually long SSR in their exons, long SSRs are deleterious sequences in almost all functions and are removed by purifying selection. The strength of this purifying selection however greatly varies from function to function. We discuss possible explanations for this intriguing result.

摘要

简单序列重复(SSRs)是真核生物基因组中非常常见的短重复序列。“长”SSRs被认为是“高变”序列,因为它们表现出较高的扩增和收缩率。由于它们可能具有有害性,长SSRs在编码序列中往往不常见。然而,有几个基因在其外显子序列中包含长SSRs。在这里,我们鉴定出1291个人类基因,这些基因含有足够长的单核苷酸SSRs,易于扩增或收缩,此后被称为高变序列。基于基因本体注释,我们表明在这些高变基因中只有有限数量的功能过度富集,包括细胞周期和DNA完整性的维持。使用概率模型,我们表明参与这些功能的基因预计会包含长SSRs,因为它们往往较长和/或在核苷酸组成上存在偏差。最后,我们表明对于几乎所有功能,我们观察到的高变序列比中性模型下预期的要少。然而,也有一些有趣的例外,例如参与蛋白质和RNA运输的基因,以及减数分裂和错配修复功能的基因,它们具有与中性条件下预期数量相同的高变基因。相反,有些功能(如胶原蛋白相关基因)中,高变基因比其他功能中更常被避免。我们的结果表明,尽管有几个功能在外显子中含有异常长的SSRs,但长SSRs在几乎所有功能中都是有害序列,并通过纯化选择被去除。然而,这种纯化选择的强度在不同功能之间差异很大。我们讨论了这一有趣结果的可能解释。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验