Suppr超能文献

历经漫长岁月研究基因组:蛋白质家族、假基因与蛋白质组进化

Studying genomes through the aeons: protein families, pseudogenes and proteome evolution.

作者信息

Harrison Paul M, Gerstein Mark

机构信息

Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520-8114, USA.

出版信息

J Mol Biol. 2002 May 17;318(5):1155-74. doi: 10.1016/s0022-2836(02)00109-2.

Abstract

Protein families can be used to understand many aspects of genomes, both their "live" and their "dead" parts (i.e. genes and pseudogenes). Surveys of genomes have revealed that, in every organism, there are always a few large families and many small ones, with the overall distribution following a power-law. This commonality is equally true for both genes and pseudogenes, and exists despite the fact that the specific families that are enlarged differ greatly between organisms. Furthermore, because of family structure there is great redundancy in proteomes, a fact linked to the large number of dispensable genes for each organism and the small size of the minimal, indispensable sub-proteome. Pseudogenes in prokaryotes represent families that are in the process of being dispensed with. In particular, the genome sequences of certain pathogenic bacteria (Mycobacterium leprae, Yersinia pestis and Rickettsia prowazekii) show how an organism can undergo reductive evolution on a large scale (i.e. the dying out of families) as a result of niche change. There appears to be less pressure to delete pseudogenes in eukaryotes. These can be divided into two varieties, duplicated and processed, where the latter involves reverse transcription from an mRNA intermediate. We discuss these collectively in yeast, worm, fly, and human. The fly has few pseudogenes apparently because of its high rate of genomic DNA deletion. In the other three organisms, the distribution of pseudogenes on the chromosome and amongst different families is highly non-uniform. Pseudogenes tend not to occur in the middle of chromosome arms, and tend to be associated with lineage-specific (as opposed to highly conserved) families that have environmental-response functions. This may be because, rather than being dead, they may form a reservoir of diverse "extra parts" that can be resurrected to help an organism adapt to its surroundings. In yeast, there may be a novel mechanism involving the [PSI+] prion that potentially enables this resurrection. In worm, the pseudogenes tend to arise out of families (e.g. chemoreceptors) that are greatly expanded in it compared to the fly. The human genome stands out in having many processed pseudogenes. These have a character very different from those of the duplicated variety, to a large extent just representing random insertions. Thus, their occurrence tends to be roughly in proportion to the amount of mRNA for a particular protein and to reflect the extent of the intergenic sequences. Further information about pseudogenes is available at http://genecensus.org/pseudogene

摘要

蛋白质家族可用于理解基因组的许多方面,包括其“活性”部分和“无活性”部分(即基因和假基因)。对基因组的研究表明,在每种生物中,总是存在一些大的家族和许多小的家族,总体分布遵循幂律。这种共性对于基因和假基因同样适用,尽管不同生物中扩增的具体家族差异很大,但仍然存在。此外,由于家族结构的存在,蛋白质组中存在大量冗余,这一事实与每种生物中大量的可 dispensable 基因以及最小的、不可或缺的亚蛋白质组的小尺寸相关。原核生物中的假基因代表了正在被淘汰的家族。特别是,某些致病细菌(麻风分枝杆菌、鼠疫耶尔森菌和普氏立克次体)的基因组序列显示了生物体如何由于生态位变化而经历大规模的简化进化(即家族的消亡)。在真核生物中删除假基因的压力似乎较小。这些假基因可分为两种类型,即重复型和加工型,其中后者涉及从 mRNA 中间体的逆转录。我们在酵母、线虫、果蝇和人类中对这些进行了综合讨论。果蝇的假基因很少,显然是因为其基因组 DNA 删除率很高。在其他三种生物中,假基因在染色体上以及不同家族之间的分布高度不均匀。假基因往往不发生在染色体臂的中间,并且往往与具有环境响应功能的谱系特异性(而非高度保守)家族相关。这可能是因为,它们可能并非无活性,而是形成了一个多样化“额外部分”的库,可以被复活以帮助生物体适应其周围环境。在酵母中,可能存在一种涉及[PSI+]朊病毒的新机制,有可能实现这种复活。在线虫中,假基因往往源自与果蝇相比在其中大量扩增的家族(如化学感受器)。人类基因组以拥有许多加工型假基因而引人注目。这些假基因的特征与重复型假基因有很大不同,在很大程度上只是代表随机插入。因此,它们的出现往往大致与特定蛋白质的 mRNA 的量成比例,并反映基因间序列的程度。有关假基因的更多信息可在 http://genecensus.org/pseudogene 上获取

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验