Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD Australia.
Cell Cycle. 2013 Jul 1;12(13):2061-72. doi: 10.4161/cc.25134. Epub 2013 Jun 6.
It is now clear that animal genomes are predominantly non-protein-coding, and that these sequences encode a wide array of RNA transcripts and other regulatory elements that are fundamental to the development of complex life. We have previously argued that the proportion of an animal genome that is non-protein-coding DNA (ncDNA) correlates well with its apparent biological complexity. Here we extend on that work and, using data from a total of 1,627 prokaryotic and 153 eukaryotic complete and annotated genomes, show that the proportion of ncDNA per haploid genome is significantly positively correlated with a previously published proxy of biological complexity, the number of distinct cell types. This is in contrast to the amount of the genome that encodes proteins, which we show is essentially unchanged across Metazoa. Furthermore, using a total of 179 RNA-seq data sets from nematode (47), fruit fly (72), zebrafish (20) and human (42), we show, consistent with other recent reports, that the vast majority of ncDNA in animals is transcribed. This includes more than 60 human loci previously considered "gene deserts," many of which are expressed tissue-specifically and associated with previously reported GWAS SNPs. These results suggest that ncDNA, and the ncRNAs encoded within it, may be intimately involved in the evolution, maintenance and development of complex life.
现在很清楚的是,动物基因组主要是非蛋白编码的,这些序列编码了广泛的 RNA 转录物和其他调控元件,它们是复杂生命发展的基础。我们之前曾提出,动物基因组中非蛋白编码 DNA(ncDNA)的比例与它明显的生物学复杂性密切相关。在这里,我们扩展了这项工作,利用来自 1627 个原核生物和 153 个真核生物完整注释基因组的数据,表明每个单倍体基因组中非编码 DNA 的比例与之前发表的生物学复杂性的替代指标,即不同细胞类型的数量呈显著正相关。这与编码蛋白质的基因组数量形成了鲜明对比,我们发现蛋白质的基因组数量在后生动物中基本保持不变。此外,我们总共使用了来自线虫(47 个)、果蝇(72 个)、斑马鱼(20 个)和人类(42 个)的 179 个 RNA-seq 数据集,与其他最近的报告一致,我们发现动物中绝大多数的 ncDNA 都被转录了。这包括了之前被认为是“基因荒漠”的 60 多个人类基因座,其中许多基因座在组织特异性表达,并与之前报道的 GWAS SNPs 相关。这些结果表明,ncDNA 及其编码的 ncRNA 可能与复杂生命的进化、维持和发展密切相关。