Institute of Cotton Research, Chinese Academy of Agricultural Sciences, Anyang, 455000, China.
Genomics Lab, Department of Plant Breeding and Genetics, Faculty of Agricultural Sciences and Technology, Bahauddin Zakariya University, Multan, Punjab, 60000, Pakistan.
BMC Genomics. 2020 Jul 8;21(1):470. doi: 10.1186/s12864-020-06814-5.
Genome sequencing technologies have been improved at an exponential pace but precise chromosome-scale genome assembly still remains a great challenge. The draft genome of cultivated G. arboreum was sequenced and assembled with shotgun sequencing approach, however, it contains several misassemblies. To address this issue, we generated an improved reassembly of G. arboreum chromosome 12 using genetic mapping and reference-assisted approaches and evaluated this reconstruction by comparing with homologous chromosomes of G. raimondii and G. hirsutum.
In this study, we generated a high quality assembly of the 94.64 Mb length of G. arboreum chromosome 12 (A_A12) which comprised of 144 scaffolds and contained 3361 protein coding genes. Evaluation of results using syntenic and collinear analysis of reconstructed G. arboreum chromosome A_A12 with its homologous chromosomes of G. raimondii (D_D08) and G. hirsutum (AD_A12 and AD_D12) confirmed the significant improved quality of current reassembly as compared to previous one. We found major misassemblies in previously assembled chromosome 12 (A_Ca9) of G. arboreum particularly in anchoring and orienting of scaffolds into a pseudo-chromosome. Further, homologous chromosomes 12 of G. raimondii (D_D08) and G. arboreum (A_A12) contained almost equal number of transcription factor (TF) related genes, and showed good collinear relationship with each other. As well, a higher rate of gene loss was found in corresponding homologous chromosomes of tetraploid (AD_A12 and AD_D12) than diploid (A_A12 and D_D08) cotton, signifying that gene loss is likely a continuing process in chromosomal evolution of tetraploid cotton.
This study offers a more accurate strategy to correct misassemblies in sequenced draft genomes of cotton which will provide further insights towards its genome organization.
基因组测序技术的发展速度呈指数级增长,但精确的染色体级基因组组装仍然是一个巨大的挑战。经过测序和组装,已经获得了栽培棉种亚洲棉的基因组草图,但它包含了一些错误的组装。为了解决这个问题,我们利用遗传图谱和参考辅助方法对亚洲棉第 12 号染色体进行了重新组装,并与雷蒙德氏棉和陆地棉的同源染色体进行了比较。
本研究生成了亚洲棉第 12 号染色体(A_A12)的高质量组装,其长度为 94.64Mb,包含 144 个支架,包含 3361 个蛋白质编码基因。通过对重建的亚洲棉第 12 号染色体 A_A12 与其同源染色体雷蒙德氏棉(D_D08)和陆地棉(AD_A12 和 AD_D12)的同线性和共线性分析,评估结果表明,与之前的组装相比,当前的重新组装质量有了显著提高。我们发现,以前组装的亚洲棉第 12 号染色体(A_Ca9)中存在主要的错误组装,特别是在支架的锚定和定向到假染色体上。此外,雷蒙德氏棉(D_D08)和亚洲棉(A_A12)的第 12 号同源染色体包含几乎相同数量的转录因子(TF)相关基因,并且彼此之间存在良好的共线性关系。同样,在四倍体(AD_A12 和 AD_D12)的相应同源染色体中发现了更高的基因丢失率,而在二倍体(A_A12 和 D_D08)中则较低,这表明基因丢失很可能是四倍体棉花染色体进化过程中的一个持续过程。
本研究为纠正棉花测序草图基因组中的错误组装提供了更准确的策略,这将为其基因组组织提供进一步的见解。