Lok Si, Paton Tara A, Wang Zhuozhi, Kaur Gaganjot, Walker Susan, Yuen Ryan K C, Sung Wilson W L, Whitney Joseph, Buchanan Janet A, Trost Brett, Singh Naina, Apresto Beverly, Chen Nan, Coole Matthew, Dawson Travis J, Ho Karen, Hu Zhizhou, Pullenayegum Sanjeev, Samler Kozue, Shipstone Arun, Tsoi Fiona, Wang Ting, Pereira Sergio L, Rostami Pirooz, Ryan Carol Ann, Tong Amy Hin Yan, Ng Karen, Sundaravadanam Yogi, Simpson Jared T, Lim Burton K, Engstrom Mark D, Dutton Christopher J, Kerr Kevin C R, Franke Maria, Rapley William, Wintle Richard F, Scherer Stephen W
The Centre for Applied Genomics, The Hospital for Sick Children, Toronto, Ontario M5G 0A4, Canada
Program in Genetics and Genome Biology, The Hospital for Sick Children, Toronto, Ontario M5G 0A4, Canada.
G3 (Bethesda). 2017 Feb 9;7(2):755-773. doi: 10.1534/g3.116.038208.
The Canadian beaver () is the largest indigenous rodent in North America. We report a draft annotated assembly of the beaver genome, the first for a large rodent and the first mammalian genome assembled directly from uncorrected and moderate coverage (< 30 ×) long reads generated by single-molecule sequencing. The genome size is 2.7 Gb estimated by k-mer analysis. We assembled the beaver genome using the new Canu assembler optimized for noisy reads. The resulting assembly was refined using Pilon supported by short reads (80 ×) and checked for accuracy by congruency against an independent short read assembly. We scaffolded the assembly using the exon-gene models derived from 9805 full-length open reading frames (FL-ORFs) constructed from the beaver leukocyte and muscle transcriptomes. The final assembly comprised 22,515 contigs with an N50 of 278,680 bp and an N50-scaffold of 317,558 bp. Maximum contig and scaffold lengths were 3.3 and 4.2 Mb, respectively, with a combined scaffold length representing 92% of the estimated genome size. The completeness and accuracy of the scaffold assembly was demonstrated by the precise exon placement for 91.1% of the 9805 assembled FL-ORFs and 83.1% of the BUSCO (Benchmarking Universal Single-Copy Orthologs) gene set used to assess the quality of genome assemblies. Well-represented were genes involved in dentition and enamel deposition, defining characteristics of rodents with which the beaver is well-endowed. The study provides insights for genome assembly and an important genomics resource for Castoridae and rodent evolutionary biology.
加拿大河狸()是北美洲最大的本土啮齿动物。我们报告了河狸基因组的注释草图组装结果,这是大型啮齿动物的首个此类结果,也是首个直接从单分子测序产生的未经校正且覆盖度适中(<30×)的长读段组装而成的哺乳动物基因组。通过k-mer分析估计基因组大小为27亿碱基对。我们使用针对有噪声读段优化的新Canu组装器来组装河狸基因组。使用短读段(80×)支持的Pilon对所得组装结果进行优化,并通过与独立的短读段组装结果进行一致性检查来验证准确性。我们使用从河狸白细胞和肌肉转录组构建的9805个全长开放阅读框(FL-ORF)推导的外显子-基因模型对组装结果进行支架构建。最终组装结果包含22,515个重叠群,N50为278,680碱基对,N50-支架为317,558碱基对。最大重叠群和支架长度分别为330万和420万碱基对,组合支架长度占估计基因组大小的92%。9805个组装的FL-ORF中有91.1%以及用于评估基因组组装质量的BUSCO(基准通用单拷贝直系同源基因)基因集中有83.1%的外显子精确放置,证明了支架组装的完整性和准确性。参与牙列和牙釉质沉积的基因表现良好,这些基因定义了河狸所具备的啮齿动物的特征。该研究为基因组组装提供了见解,并为河狸科和啮齿动物进化生物学提供了重要的基因组学资源。