MRC Functional Genomics Unit, Department of Physiology, Anatomy and Genetics, University of Oxford, South Parks Road, Oxford, UK.
Hum Mol Genet. 2010 Oct 15;19(R2):R162-8. doi: 10.1093/hmg/ddq362. Epub 2010 Aug 25.
Genomic tiling arrays, cDNA sequencing and, more recently, RNA-Seq have provided initial insights into the extent and depth of transcribed sequence across human and other genomes. These methods have led to greatly improved annotations of protein-coding genes, but have also identified transcription outside of annotated exons. One resultant issue that has aroused dispute is the balance of transcription of known exons against transcription outside of known exons. While non-genic 'dark matter' transcription was found by tiling arrays to be pervasive, it was seen to contribute only a small percentage of the polyadenylated transcriptome in some RNA-Seq experiments. This apparent contradiction has been compounded by a lack of clarity about what exactly constitutes a protein-coding gene. It remains unclear, for example, whether or not all transcripts that overlap on either strand within a genomic locus should be assigned to a single gene locus, including those that fail to share promoters, exons and splice junctions. The inability of tiling arrays and RNA-Seq to count transcripts, rather than exons or exon pairs, adds to these difficulties. While there is agreement that thousands of apparently non-coding loci are present outside of protein-coding genes in the human genome, there is vigorous debate of what constitutes evidence for their functionality. These issues will only be resolved upon the demonstration, or otherwise, that organismal or cellular phenotypes frequently result when non-coding RNA loci are disrupted.
基因组平铺阵列、cDNA 测序,以及最近的 RNA-Seq,为我们深入了解人类和其他基因组中转录序列的范围和深度提供了初步的认识。这些方法极大地改进了对蛋白质编码基因的注释,但也鉴定出了在注释外显子之外的转录。由此产生的一个问题引起了争议,即已知外显子的转录与已知外显子之外的转录之间的平衡。虽然平铺阵列发现非基因“暗物质”转录普遍存在,但在一些 RNA-Seq 实验中,它只占多聚腺苷酸化转录组的一小部分。由于对什么构成蛋白质编码基因缺乏明确性,这种明显的矛盾更加复杂。例如,在基因组位置上,无论是在任一链上重叠的所有转录本是否都应该被分配到单个基因位置,包括那些不共享启动子、外显子和剪接连接的转录本,这一点仍然不清楚。平铺阵列和 RNA-Seq 无法计数转录本,而不是外显子或外显子对,这增加了这些困难。虽然人们一致认为,在人类基因组中,除了蛋白质编码基因之外,还有数千个明显的非编码基因存在,但对于什么构成它们功能的证据存在激烈的争论。只有当非编码 RNA 基因座被破坏时,生物体或细胞表型经常出现,这些问题才会得到解决。