Ontario Institute for Cancer Research, Toronto, Canada.
BMC Genomics. 2010 Feb 10;11 Suppl 1(Suppl 1):S4. doi: 10.1186/1471-2164-11-S1-S4.
We consider the problem of biological complexity via a projection of protein-coding genes of complex organisms onto the functional space of the proteome. The latter can be defined as a set of all functions committed by proteins of an organism. Alternative splicing (AS) allows an organism to generate diverse mature RNA transcripts from a single mRNA strand and thus it could be one of the key mechanisms of increasing of functional complexity of the organism's proteome and a driving force of biological evolution. Thus, the projection of transcription units (TU) and alternative splice-variant (SV) forms onto proteome functional space could generate new types of relational networks (e.g. SV-protein function networks, SFN) and lead to discoveries of novel evolutionarily conservative functional modules. Such types of networks might provide new reliable characteristics of organism complexity and a better understanding of the evolutionary integration and plasticity of interconnection of genome-transcriptome-proteome functions.
We use the InterPro and UniProt databases to attribute descriptive features (keywords) to protein sequences. UniProt database includes a controlled and curated vocabulary of specific descriptors or keywords. The keywords have been assigned to a protein sequence via conserved domains or via similarity with annotated sequences. Then we consider the unique combinations of keywords as the protein functional labels (FL), which characterize the biological functions of the given protein and construct the contingency tables and graphs providing the projections of transcription units (TU) and alternative splice-variants (SV) onto all FL of the proteome of a given organism. We constructed SFNs for organisms with different evolutionary history and levels of complexity, and performed detailed statistical parameterization of the networks.
The application of the algorithm to organisms with different evolutionary history and level of biological complexity (nematode, fruit fly, vertebrata) reveals that the parameters describing SFN correlate with the complexity of a given organism. Using statistical analysis of the links of the functional networks, we propose new features of evolution of protein function acquisition. We reveal a group of genes and corresponding functions, which could be attributed to an early conservative part of the cellular machinery essential for cell viability and survival. We identify and provide characteristics of functional switches in the polyform group of TUs in different organisms. Based on comparison of mouse and human SFNs, a role of alternative splicing as a necessary source of evolution towards more complex organisms is demonstrated. The entire set of FL across many organisms could be used as a draft of the catalogue of the functional space of the proteome world.
我们通过将复杂生物的蛋白质编码基因映射到蛋白质组的功能空间来研究生物复杂性问题。后者可以定义为生物体所有蛋白质功能的集合。选择性剪接 (AS) 允许生物体从单个 mRNA 链产生多种成熟的 RNA 转录本,因此它可能是增加生物体蛋白质组功能复杂性的关键机制之一,也是生物进化的驱动力。因此,将转录单元 (TU) 和选择性剪接变体 (SV) 形式投射到蛋白质组功能空间上,可以生成新类型的关系网络(例如 SV-蛋白质功能网络,SFN),并发现新的进化保守功能模块。这种类型的网络可能提供生物体复杂性的新可靠特征,并更好地理解基因组-转录组-蛋白质组功能的进化整合和可塑性。
我们使用 InterPro 和 UniProt 数据库为蛋白质序列分配描述性特征(关键字)。UniProt 数据库包含特定描述符或关键字的受控和精心编制的词汇。通过保守结构域或与注释序列的相似性,将关键字分配给蛋白质序列。然后,我们将唯一的关键字组合视为蛋白质功能标签 (FL),它描述了给定蛋白质的生物学功能,并构建了提供给定生物体蛋白质组所有 FL 投影的列联表和图形。我们为具有不同进化历史和复杂性水平的生物体构建了 SFN,并对网络进行了详细的统计参数化。
将该算法应用于具有不同进化历史和生物复杂性水平的生物体(线虫、果蝇、脊椎动物)的结果表明,描述 SFN 的参数与给定生物体的复杂性相关。通过对功能网络的链接进行统计分析,我们提出了蛋白质功能获得进化的新特征。我们揭示了一组可归因于细胞机制早期保守部分的基因和相应功能,这些部分对于细胞活力和生存至关重要。我们识别并提供了不同生物体中多态 TU 功能开关的特征。基于对小鼠和人类 SFN 的比较,证明了选择性剪接作为向更复杂生物体进化的必要来源的作用。许多生物体的整个 FL 集合可用作蛋白质组功能空间目录的草案。