Suppr超能文献

RCPdb:一个用于含重复序列蛋白质的进化分类和密码子使用数据库。

RCPdb: An evolutionary classification and codon usage database for repeat-containing proteins.

作者信息

Faux Noel G, Huttley Gavin A, Mahmood Khalid, Webb Geoffrey I, de la Banda Maria Garcia, Whisstock James C

机构信息

Protein Crystallography Unit, Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Victoria, Australia.

出版信息

Genome Res. 2007 Jul;17(7):1118-27. doi: 10.1101/gr.6255407. Epub 2007 Jun 13.

Abstract

Over 3% of human proteins contain single amino acid repeats (repeat-containing proteins, RCPs). Many repeats (homopeptides) localize to important proteins involved in transcription, and the expansion of certain repeats, in particular poly-Q and poly-A tracts, can also lead to the development of neurological diseases. Previous studies have suggested that the homopeptide makeup is a result of the presence of G+C-rich tracts in the encoding genes and that expansion occurs via replication slippage. Here, we have performed a large-scale genomic analysis of the variation of the genes encoding RCPs in 13 species and present these data in an online database (http://repeats.med.monash.edu.au/genetic_analysis/). This resource allows rapid comparison and analysis of RCPs, homopeptides, and their underlying genetic tracts across the eukaryotic species considered. We report three major findings. First, there is a bias for a small subset of codons being reiterated within homopeptides, and there is no G+C or A+T bias relative to the organism's transcriptome. Second, single base pair transversions from the homocodon are unusually common and may represent a mechanism of reducing the rate of homopeptide mutations. Third, homopeptides that are conserved across different species lie within regions that are under stronger purifying selection in contrast to nonconserved homopeptides.

摘要

超过3%的人类蛋白质含有单氨基酸重复序列(含重复序列的蛋白质,RCPs)。许多重复序列(同肽)定位于参与转录的重要蛋白质中,某些重复序列的扩增,特别是多聚谷氨酰胺和多聚腺嘌呤序列,也会导致神经疾病的发生。先前的研究表明,同肽组成是编码基因中富含G+C序列存在的结果,并且扩增是通过复制滑动发生的。在这里,我们对13个物种中编码RCPs的基因变异进行了大规模基因组分析,并将这些数据呈现在一个在线数据库中(http://repeats.med.monash.edu.au/genetic_analysis/)。该资源允许对所考虑的真核生物物种中的RCPs、同肽及其潜在的遗传序列进行快速比较和分析。我们报告了三个主要发现。第一,同肽内存在一小部分密码子被重复的偏向性,并且相对于生物体的转录组不存在G+C或A+T偏向性。第二,同密码子的单碱基对颠换异常常见,可能代表了一种降低同肽突变率的机制。第三,与非保守同肽相比,在不同物种间保守的同肽位于受到更强纯化选择的区域内。

相似文献

引用本文的文献

1
Genes Polymorphism Depicts Developmental Disruption of Common Sole Eggs.基因多态性揭示了欧洲鳎鱼卵发育的异常。
Open Life Sci. 2019 Dec 31;14:549-563. doi: 10.1515/biol-2019-0061. eCollection 2019 Jan.
6
Profiles of low complexity regions in Apicomplexa.顶复门低复杂性区域图谱。
BMC Evol Biol. 2016 Feb 29;16:47. doi: 10.1186/s12862-016-0625-0.

本文引用的文献

1
DNA structures, repeat expansions and human hereditary disorders.DNA结构、重复序列扩增与人类遗传性疾病
Curr Opin Struct Biol. 2006 Jun;16(3):351-8. doi: 10.1016/j.sbi.2006.05.004. Epub 2006 May 19.
6
The Universal Protein Resource (UniProt).通用蛋白质资源(UniProt)。
Nucleic Acids Res. 2005 Jan 1;33(Database issue):D154-9. doi: 10.1093/nar/gki070.
7
Molecular origins of rapid and continuous morphological evolution.快速且持续形态演化的分子起源
Proc Natl Acad Sci U S A. 2004 Dec 28;101(52):18058-63. doi: 10.1073/pnas.0408118101. Epub 2004 Dec 13.
8
A genomic basis for the evolution of vertebrate transcription factors containing amino Acid runs.
Genetics. 2004 Aug;167(4):1813-20. doi: 10.1534/genetics.104.029082.
10
'Harvester': a fast meta search engine of human protein resources.“Harvester”:一款快速的人类蛋白质资源元搜索引擎。
Bioinformatics. 2004 Aug 12;20(12):1962-3. doi: 10.1093/bioinformatics/bth146. Epub 2004 Feb 26.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验