Suppr超能文献

生物体的复杂性与蛋白质家族和结构域的数量密切相关。

Organismal complexity strongly correlates with the number of protein families and domains.

作者信息

Alvarez-Ponce David, Krishnamurthy Subramanian

机构信息

Biology Department, University of Nevada, Reno, NV 89557.

Duncan and Nancy MacMillan Cancer Immunology and Metabolism Center of Excellence, Rutgers Cancer Institute of New Jersey, New Brunswick, NJ 08901.

出版信息

Proc Natl Acad Sci U S A. 2025 Feb 4;122(5):e2404332122. doi: 10.1073/pnas.2404332122. Epub 2025 Jan 28.

Abstract

In the pregenomic era, scientists were puzzled by the observation that haploid genome size (the C-value) did not correlate well with organismal complexity. This phenomenon, called the "C-value paradox," is mostly explained by the fact that protein-coding genes occupy only a small fraction of eukaryotic genomes. When the first genome sequences became available, scientists were even more surprised by the fact that the number of genes (G-value) was also a poor predictor of complexity, which gave rise to the "G-value paradox." The proposed explanations usually invoke mechanisms that increase the information content of each individual gene (protein-protein interactions, intrinsic disorder, posttranslational modifications, alternative splicing, etc.). Less attention has been paid to mechanisms that increase the amount of genetic material but do not increase (or not to the same extent) the amount of information encoded in the genome, such as gene duplication and domain shuffling. Proteins belonging to the same family and/or sharing the same domains often carry out similar or even redundant functions. We thus hypothesized that an organism's number of different protein families and domains should be suitable predictors of organismal complexity. In agreement with our hypothesis, we observed that the number of protein families, clans, domains, and motifs increases from simple to progressively more complex organisms. In addition, these metrics correlate with the number of cell types better than and independently of the number of protein-coding genes and several previously proposed predictors of organismal complexity. Our observations have the potential to represent a resolution to the G-value paradox.

摘要

在前基因组时代,科学家们对单倍体基因组大小(C值)与生物体复杂性之间缺乏良好相关性的观察结果感到困惑。这种现象被称为“C值悖论”,主要是因为蛋白质编码基因仅占真核生物基因组的一小部分。当第一批基因组序列问世时,科学家们更惊讶地发现,基因数量(G值)同样也不能很好地预测复杂性,这就产生了“G值悖论”。提出的解释通常涉及增加单个基因信息含量的机制(蛋白质-蛋白质相互作用、内在无序性、翻译后修饰、可变剪接等)。人们较少关注增加遗传物质数量但不增加(或增加程度不同)基因组中编码信息数量的机制,如基因复制和结构域改组。属于同一家族和/或共享相同结构域的蛋白质通常具有相似甚至冗余的功能。因此,我们假设生物体中不同蛋白质家族和结构域的数量应该是生物体复杂性的合适预测指标。与我们的假设一致,我们观察到从简单生物体到逐渐复杂的生物体,蛋白质家族、族、结构域和基序的数量不断增加。此外,这些指标与细胞类型数量的相关性比蛋白质编码基因数量以及之前提出的几个生物体复杂性预测指标更好,且相互独立。我们的观察结果有可能为G值悖论提供一种解决方案。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c095/11804679/ba9c29a26550/pnas.2404332122fig03.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验