Vishnoi Anchal, Roy Rahul, Bhattacharya Alok
Center for Computational Biology and Bioinformatics, School of Information Technology, Indian Statistical Institute, New Delhi 110016, India.
Nucleic Acids Res. 2007;35(11):3654-67. doi: 10.1093/nar/gkm209. Epub 2007 May 8.
Comparative genomic approaches are useful in identifying molecular differences between organisms. Currently available methods fail to identify small changes in genomes, such as expansion of short repetitive motifs and to analyse divergent sequences. In this report, we describe an anchor-based whole genome comparison (ABWGC) method. ABWGC is based on random sampling of anchor sequences from one genome, followed by analysis of sampled and homologous regions from the target genome. The method was applied to compare two strains of Mycobacterium tuberculosis CDC1551 and H37Rv. ABWGC was able to identify a total of 104 indels including 20 expansion of short repetitive sequences and five recombination events. It included 18 new unidentified genomic differences. ABWGC also identified 188 SNPs including eight new ones. The method was also used to compare M. tuberculosis H37Rv and M. avium genomes. ABWGC was able to correctly pick 1002 additional indels (size >100 nt) between the two organisms in contrast to MUMmer, a popular tool for comparative genomics. ABWGC was able to identify correctly repeat expansion and indels in a set of simulated sequences. The study also revealed important role of small repeat expansion in the evolution of M. tuberculosis strains.
比较基因组学方法有助于识别生物体之间的分子差异。目前可用的方法无法识别基因组中的微小变化,例如短重复基序的扩增,也无法分析分歧序列。在本报告中,我们描述了一种基于锚定的全基因组比较(ABWGC)方法。ABWGC基于从一个基因组中随机采样锚定序列,然后分析目标基因组中的采样区域和同源区域。该方法被应用于比较两株结核分枝杆菌CDC1551和H37Rv。ABWGC总共能够识别出104个插入缺失,包括20个短重复序列的扩增和5个重组事件。其中包括18个新的未识别的基因组差异。ABWGC还识别出188个单核苷酸多态性,包括8个新的单核苷酸多态性。该方法还被用于比较结核分枝杆菌H37Rv和鸟分枝杆菌的基因组。与比较基因组学的常用工具MUMmer相比,ABWGC能够正确识别出这两种生物体之间另外1002个插入缺失(大小>100 nt)。ABWGC能够在一组模拟序列中正确识别重复扩增和插入缺失。该研究还揭示了小重复扩增在结核分枝杆菌菌株进化中的重要作用。