Department of Informatics, Wellcome Trust Sanger Institute, Hinxton CB10 1SA, United Kingdom.
Genome Res. 2013 Dec;23(12):1961-73. doi: 10.1101/gr.161315.113. Epub 2013 Oct 30.
The last decade has seen tremendous effort committed to the annotation of the human genome sequence, most notably perhaps in the form of the ENCODE project. One of the major findings of ENCODE, and other genome analysis projects, is that the human transcriptome is far larger and more complex than previously thought. This complexity manifests, for example, as alternative splicing within protein-coding genes, as well as in the discovery of thousands of long noncoding RNAs. It is also possible that significant numbers of human transcripts have not yet been described by annotation projects, while existing transcript models are frequently incomplete. The question as to what proportion of this complexity is truly functional remains open, however, and this ambiguity presents a serious challenge to genome scientists. In this article, we will discuss the current state of human transcriptome annotation, drawing on our experience gained in generating the GENCODE gene annotation set. We highlight the gaps in our knowledge of transcript functionality that remain, and consider the potential computational and experimental strategies that can be used to help close them. We propose that an understanding of the true overlap between transcriptional complexity and functionality will not be gained in the short term. However, significant steps toward obtaining this knowledge can now be taken by using an integrated strategy, combining all of the experimental resources at our disposal.
过去十年,人们投入了大量精力对人类基因组序列进行注释,其中最著名的或许是 ENCODE 项目。ENCODE 和其他基因组分析项目的主要发现之一是,人类转录组远比以前认为的更大、更复杂。这种复杂性体现在蛋白质编码基因的选择性剪接中,也体现在发现了数千种长非编码 RNA 上。也有可能大量的人类转录本尚未被注释项目所描述,而现有的转录本模型也经常不完整。然而,目前仍不清楚这种复杂性有多大比例是真正有功能的,这种不确定性给基因组科学家带来了严重的挑战。在本文中,我们将根据我们在生成 GENCODE 基因注释集方面的经验,讨论人类转录组注释的现状。我们强调了我们对转录本功能的了解仍然存在的差距,并考虑了可以用来帮助弥补这些差距的潜在计算和实验策略。我们认为,在短期内不太可能了解转录复杂性和功能之间的真正重叠。然而,现在可以通过采用综合策略,结合我们所能利用的所有实验资源,朝着获得这些知识迈出重要的一步。