Warren Wesley C, Hillier LaDeana W, Tomlinson Chad, Minx Patrick, Kremitzki Milinn, Graves Tina, Markovic Chris, Bouk Nathan, Pruitt Kim D, Thibaud-Nissen Francoise, Schneider Valerie, Mansour Tamer A, Brown C Titus, Zimin Aleksey, Hawken Rachel, Abrahamsen Mitch, Pyrkosz Alexis B, Morisson Mireille, Fillon Valerie, Vignal Alain, Chow William, Howe Kerstin, Fulton Janet E, Miller Marcia M, Lovell Peter, Mello Claudio V, Wirthlin Morgan, Mason Andrew S, Kuo Richard, Burt David W, Dodgson Jerry B, Cheng Hans H
McDonnell Genome Institute, Washington University School of Medicine, St. Louis, Missouri 63108
McDonnell Genome Institute, Washington University School of Medicine, St. Louis, Missouri 63108.
G3 (Bethesda). 2017 Jan 5;7(1):109-117. doi: 10.1534/g3.116.035923.
The importance of the Gallus gallus (chicken) as a model organism and agricultural animal merits a continuation of sequence assembly improvement efforts. We present a new version of the chicken genome assembly (Gallus_gallus-5.0; GCA_000002315.3), built from combined long single molecule sequencing technology, finished BACs, and improved physical maps. In overall assembled bases, we see a gain of 183 Mb, including 16.4 Mb in placed chromosomes with a corresponding gain in the percentage of intact repeat elements characterized. Of the 1.21 Gb genome, we include three previously missing autosomes, GGA30, 31, and 33, and improve sequence contig length 10-fold over the previous Gallus_gallus-4.0. Despite the significant base representation improvements made, 138 Mb of sequence is not yet located to chromosomes. When annotated for gene content, Gallus_gallus-5.0 shows an increase of 4679 annotated genes (2768 noncoding and 1911 protein-coding) over those in Gallus_gallus-4.0. We also revisited the question of what genes are missing in the avian lineage, as assessed by the highest quality avian genome assembly to date, and found that a large fraction of the original set of missing genes are still absent in sequenced bird species. Finally, our new data support a detailed map of MHC-B, encompassing two segments: one with a highly stable gene copy number and another in which the gene copy number is highly variable. The chicken model has been a critical resource for many other fields of study, and this new reference assembly will substantially further these efforts.
原鸡作为模式生物和农业动物的重要性,使得继续改进序列组装工作很有必要。我们展示了鸡基因组组装的新版本(Gallus_gallus-5.0;GCA_000002315.3),它是由长单分子测序技术、完成的细菌人工染色体(BAC)和改进的物理图谱组合构建而成。在总体组装碱基方面,我们看到增加了183兆碱基对(Mb),其中包括定位到染色体上的16.4 Mb,同时已鉴定的完整重复元件的百分比也相应增加。在1.21 Gb的基因组中,我们纳入了三条先前缺失的常染色体,即GGA30、31和33,并将序列重叠群长度比之前的Gallus_gallus-4.0提高了10倍。尽管在碱基代表性方面有显著改进,但仍有138 Mb的序列未定位到染色体上。当对基因内容进行注释时,Gallus_gallus-5.0显示出比Gallus_gallus-4.0增加了4679个注释基因(2768个非编码基因和1911个蛋白质编码基因)。我们还重新审视了禽类谱系中缺失哪些基因的问题,通过迄今为止质量最高的禽类基因组组装进行评估,发现最初那组缺失基因中的很大一部分在已测序的鸟类物种中仍然缺失。最后,我们的新数据支持了MHC-B的详细图谱,它包含两个片段:一个片段的基因拷贝数高度稳定,另一个片段的基因拷贝数高度可变。鸡模型一直是许多其他研究领域的关键资源,这个新的参考组装将极大地推动这些研究工作。