Genomics Institute, University of California Santa Cruz and Howard Hughes Medical Institute, Santa Cruz, California 95064, USA.
10x Genomics, Pleasanton, California 94566, USA.
Genome Res. 2018 Jul;28(7):1029-1038. doi: 10.1101/gr.233460.117. Epub 2018 Jun 8.
The recent introductions of low-cost, long-read, and read-cloud sequencing technologies coupled with intense efforts to develop efficient algorithms have made affordable, high-quality de novo sequence assembly a realistic proposition. The result is an explosion of new, ultracontiguous genome assemblies. To compare these genomes, we need robust methods for genome annotation. We describe the fully open source Comparative Annotation Toolkit (CAT), which provides a flexible way to simultaneously annotate entire clades and identify orthology relationships. We show that CAT can be used to improve annotations on the rat genome, annotate the great apes, annotate a diverse set of mammals, and annotate personal, diploid human genomes. We demonstrate the resulting discovery of novel genes, isoforms, and structural variants-even in genomes as well studied as rat and the great apes-and how these annotations improve cross-species RNA expression experiments.
最近推出的低成本、长读长和云读测序技术,加上开发高效算法的努力,使得负担得起的高质量从头序列组装成为现实。其结果是新的超连续基因组组装的爆炸式增长。为了比较这些基因组,我们需要稳健的基因组注释方法。我们描述了完全开源的比较注释工具包(CAT),它提供了一种灵活的方法,可以同时注释整个进化枝并识别同源关系。我们表明 CAT 可以用于改进大鼠基因组的注释,注释大猿类,注释一组多样化的哺乳动物,以及注释个人的二倍体人类基因组。我们展示了由此产生的新基因、异构体和结构变异的发现,即使在像大鼠和大猿这样研究充分的基因组中,以及这些注释如何改进跨物种 RNA 表达实验。