Martínez Octavio, Reyes-Valdés M Humberto
Laboratorio Nacional de Genómica para la Biodiversidad (Langebio), Cinvestav, Campus Guanajuato, Apartado Postal 629, C.P. 36500 Irapuato, Guanajuato, Mexico.
Proc Natl Acad Sci U S A. 2008 Jul 15;105(28):9709-14. doi: 10.1073/pnas.0803479105. Epub 2008 Jul 7.
The transcriptome is a set of genes transcribed in a given tissue under specific conditions and can be characterized by a list of genes with their corresponding frequencies of transcription. Transcriptome changes can be measured by counting gene tags from mRNA libraries or by measuring light signals in DNA microarrays. In any case, it is difficult to completely comprehend the global changes that occur in the transcriptome, given that thousands of gene expression measurements are involved. We propose an approach to define and estimate the diversity and specialization of transcriptomes and gene specificity. We define transcriptome diversity as the Shannon entropy of its frequency distribution. Gene specificity is defined as the mutual information between the tissues and the corresponding transcript, allowing detection of either housekeeping or highly specific genes and clarifying the meaning of these concepts in the literature. Tissue specialization is measured by average gene specificity. We introduce the formulae using a simple example and show their application in two datasets of gene expression in human tissues. Visualization of the positions of transcriptomes in a system of diversity and specialization coordinates makes it possible to understand at a glance their interrelations, summarizing in a powerful way which transcriptomes are richer in diversity of expressed genes, or which are relatively more specialized. The framework presented enlightens the relation among transcriptomes, allowing a better understanding of their changes through the development of the organism or in response to environmental stimuli.
转录组是在特定条件下在给定组织中转录的一组基因,可用一份带有相应转录频率的基因列表来表征。转录组变化可通过对mRNA文库中的基因标签进行计数或通过测量DNA微阵列中的光信号来测定。在任何情况下,鉴于涉及数千次基因表达测量,很难完全理解转录组中发生的全局变化。我们提出一种方法来定义和估计转录组的多样性与特异性以及基因特异性。我们将转录组多样性定义为其频率分布的香农熵。基因特异性定义为组织与相应转录本之间的互信息,这有助于检测管家基因或高度特异性基因,并阐明文献中这些概念的含义。组织特异性通过平均基因特异性来衡量。我们通过一个简单示例介绍这些公式,并展示它们在两个人类组织基因表达数据集中的应用。在多样性和特异性坐标系统中对转录组位置进行可视化,能够让人一眼了解它们之间的相互关系,以一种强大的方式总结出哪些转录组具有更丰富的表达基因多样性,或者哪些相对更具特异性。所提出的框架揭示了转录组之间的关系,有助于更好地理解它们在生物体发育过程中或对环境刺激作出反应时的变化。