Sharakhova Maria V, Hammond Martin P, Lobo Neil F, Krzywinski Jaroslaw, Unger Maria F, Hillenmeyer Maureen E, Bruggner Robert V, Birney Ewan, Collins Frank H
Center for Global Health and Infectious Diseases, University of Notre Dame, Galvin Life Sciences Building, Notre Dame, IN 46556-0369, USA.
Genome Biol. 2007;8(1):R5. doi: 10.1186/gb-2007-8-1-r5.
The genome of Anopheles gambiae, the major vector of malaria, was sequenced and assembled in 2002. This initial genome assembly and analysis made available to the scientific community was complicated by the presence of assembly issues, such as scaffolds with no chromosomal location, no sequence data for the Y chromosome, haplotype polymorphisms resulting in two different genome assemblies in limited regions and contaminating bacterial DNA.
Polytene chromosome in situ hybridization with cDNA clones was used to place 15 unmapped scaffolds (sizes totaling 5.34 Mbp) in the pericentromeric regions of the chromosomes and oriented a further 9 scaffolds. Additional analysis by in situ hybridization of bacterial artificial chromosome (BAC) clones placed 1.32 Mbp (5 scaffolds) in the physical gaps between scaffolds on euchromatic parts of the chromosomes. The Y chromosome sequence information (0.18 Mbp) remains highly incomplete and fragmented among 55 short scaffolds. Analysis of BAC end sequences showed that 22 inter-scaffold gaps were spanned by BAC clones. Unmapped scaffolds were also aligned to the chromosome assemblies in silico, identifying regions totaling 8.18 Mbp (144 scaffolds) that are probably represented in the genome project by two alternative assemblies. An additional 3.53 Mbp of alternative assembly was identified within mapped scaffolds. Scaffolds comprising 1.97 Mbp (679 small scaffolds) were identified as probably derived from contaminating bacterial DNA. In total, about 33% of previously unmapped sequences were placed on the chromosomes.
This study has used new approaches to improve the physical map and assembly of the A. gambiae genome.
疟原虫主要传播媒介冈比亚按蚊的基因组于2002年完成测序和组装。最初提供给科学界的这个基因组组装和分析因存在组装问题而变得复杂,比如没有染色体定位的支架、Y染色体没有序列数据、单倍型多态性导致在有限区域出现两种不同的基因组组装以及细菌DNA污染。
利用与cDNA克隆的多线染色体原位杂交,将15个未定位的支架(总大小为5.34 Mbp)定位到染色体的着丝粒周围区域,并确定了另外9个支架的方向。通过细菌人工染色体(BAC)克隆的原位杂交进行的额外分析,将1.32 Mbp(5个支架)定位到染色体常染色质部分支架之间的物理间隙中。Y染色体序列信息(0.18 Mbp)仍然高度不完整,分散在55个短支架中。BAC末端序列分析表明,22个支架间间隙被BAC克隆跨越。未定位的支架也通过计算机模拟与染色体组装进行比对,确定了总计8.18 Mbp(144个支架)的区域,这些区域在基因组计划中可能由两种替代组装表示。在已定位的支架内还鉴定出另外3.53 Mbp的替代组装。确定包含1.97 Mbp(679个小支架)的支架可能源自污染的细菌DNA。总共约33%的先前未定位序列被定位到染色体上。
本研究采用了新方法来改进冈比亚按蚊基因组的物理图谱和组装。