Mudge Jonathan M, Harrow Jennifer
Department of Computational Genomics, Wellcome Trust Sanger Institute, Hinxton CB10 1SA, UK.
Illumina Cambridge Ltd, Chesterford Research Park, Little Chesterford, Saffron Walden CB10 1 XL, UK.
Nat Rev Genet. 2016 Dec;17(12):758-772. doi: 10.1038/nrg.2016.119. Epub 2016 Oct 24.
A genome sequence is worthless if it cannot be deciphered; therefore, efforts to describe - or 'annotate' - genes began as soon as DNA sequences became available. Whereas early work focused on individual protein-coding genes, the modern genomic ocean is a complex maelstrom of alternative splicing, non-coding transcription and pseudogenes. Scientists - from clinicians to evolutionary biologists - need to navigate these waters, and this has led to the design of high-throughput, computationally driven annotation projects. The catalogues that are being produced are key resources for genome exploration, especially as they become integrated with expression, epigenomic and variation data sets. Their creation, however, remains challenging.
如果一个基因组序列无法被解读,那它就是毫无价值的;因此,早在DNA序列可用之时,人们就开始了描述——或者说“注释”——基因的工作。早期的工作聚焦于单个蛋白质编码基因,而现代基因组领域则是一个由可变剪接、非编码转录和假基因构成的复杂漩涡。从临床医生到进化生物学家,科学家们都需要在这片领域中探索前行,这也促使了高通量、计算驱动的注释项目的设计。正在生成的目录是基因组探索的关键资源,尤其是当它们与表达、表观基因组和变异数据集整合在一起时。然而,创建这些目录仍然具有挑战性。