Lopes Katia de Paiva, Campos-Laborie Francisco José, Vialle Ricardo Assunção, Ortega José Miguel, De Las Rivas Javier
Bioinformatics and Functional Genomics Group, Cancer Research Center (CiC-IBMCC, CSIC/USAL/IBSAL), Consejo Superior de Investigaciones Cientificas (CSIC), Salamanca, Spain.
Departamento de Bioquímica e Imunologia, Instituto de Ciências Biológicas (ICB), Universidade Federal de Minas Gerais (UFMG), Belo Horizonte, Brasil.
BMC Genomics. 2016 Oct 25;17(Suppl 8):725. doi: 10.1186/s12864-016-3062-y.
The development of large-scale technologies for quantitative transcriptomics has enabled comprehensive analysis of the gene expression profiles in complete genomes. RNA-Seq allows the measurement of gene expression levels in a manner far more precise and global than previous methods. Studies using this technology are altering our view about the extent and complexity of the eukaryotic transcriptomes. In this respect, multiple efforts have been done to determine and analyse the gene expression patterns of human cell types in different conditions, either in normal or pathological states. However, until recently, little has been reported about the evolutionary marks present in human protein-coding genes, particularly from the combined perspective of gene expression and protein evolution.
We present a combined analysis of human protein-coding gene expression profiling and time-scale ancestry mapping, that places the genes in taxonomy clades and reveals eight evolutionary major steps ("hallmarks"), that include clusters of functionally coherent proteins. The human expressed genes are analysed using a RNA-Seq dataset of 116 samples from 32 tissues. The evolutionary analysis of the human proteins is performed combining the information from: (i) a database of orthologous proteins (OMA), (ii) the taxonomy mapping of genes to lineage clades (from NCBI Taxonomy) and (iii) the evolution time-scale mapping provided by TimeTree (Timescale of Life). The human protein-coding genes are also placed in a relational context based in the construction of a robust gene coexpression network, that reveals tighter links between age-related protein-coding genes and finds functionally coherent gene modules.
Understanding the relational landscape of the human protein-coding genes is essential for interpreting the functional elements and modules of our active genome. Moreover, decoding the evolutionary history of the human genes can provide very valuable information to reveal or uncover their origin and function.
大规模定量转录组学技术的发展使得对完整基因组中的基因表达谱进行全面分析成为可能。RNA测序能够以前所未有的精度和全局性来测量基因表达水平。利用该技术开展的研究正在改变我们对真核生物转录组范围和复杂性的看法。在这方面,人们已经做出了多项努力,以确定和分析不同条件下(正常或病理状态)人类细胞类型的基因表达模式。然而,直到最近,关于人类蛋白质编码基因中存在的进化印记,特别是从基因表达和蛋白质进化的综合角度,报道仍然很少。
我们对人类蛋白质编码基因表达谱和时间尺度祖先图谱进行了综合分析,将这些基因置于分类进化枝中,并揭示了八个进化主要步骤(“印记”),其中包括功能相关蛋白质的簇。我们使用来自32个组织的116个样本的RNA测序数据集对人类表达基因进行了分析。对人类蛋白质的进化分析结合了以下信息:(i)直系同源蛋白质数据库(OMA),(ii)基因到谱系进化枝的分类图谱(来自NCBI分类法),以及(iii)TimeTree提供的进化时间尺度图谱(生命时间尺度)。基于构建一个强大的基因共表达网络,人类蛋白质编码基因也被置于一个关系背景中,该网络揭示了与年龄相关的蛋白质编码基因之间更紧密的联系,并发现了功能相关的基因模块。
了解人类蛋白质编码基因的关系格局对于解释我们活跃基因组的功能元件和模块至关重要。此外,解码人类基因的进化历史可以提供非常有价值的信息,以揭示或发现它们的起源和功能。