College of Life Sciences, Zhejiang Sci-Tech University, Hangzhou 310018, China.
Department of Biological Sciences, Center for Systems Biology, University of Texas at Dallas, Richardson, TX 75080, USA.
Brief Bioinform. 2018 May 1;19(3):361-373. doi: 10.1093/bib/bbw118.
Genomic islands (GIs) that are associated with microbial adaptations and carry sequence patterns different from that of the host are sporadically distributed among closely related species. This bias can dominate the signal of interest in GI detection. However, variations still exist among the segments of the host, although no uniform standard exists regarding the best methods of discriminating GIs from the rest of the genome in terms of compositional bias. In the present work, we proposed a robust software, MTGIpick, which used regions with pattern bias showing multiscale difference levels to identify GIs from the host. MTGIpick can identify GIs from a single genome without annotated information of genomes or prior knowledge from other data sets. When real biological data were used, MTGIpick demonstrated better performance than existing methods, as well as revealed potential GIs with accurate sizes missed by existing methods because of a uniform standard. Software and supplementary are freely available at http://bioinfo.zstu.edu.cn/MTGI or https://github.com/bioinfo0706/MTGIpick.
基因组岛 (GI) 与微生物的适应有关,其携带的序列模式与宿主不同,它们在密切相关的物种中呈散在分布。这种偏差会主导 GI 检测中感兴趣的信号。然而,尽管在组成偏差方面,从基因组的其余部分区分 GI 没有统一的最佳方法标准,但宿主的各个片段之间仍然存在差异。在本工作中,我们提出了一种稳健的软件 MTGIpick,该软件使用具有多尺度差异水平的模式偏差区域来从宿主中识别 GI。MTGIpick 可以在没有基因组注释信息或来自其他数据集的先验知识的情况下,从单个基因组中识别 GI。当使用真实的生物数据时,MTGIpick 表现出优于现有方法的性能,并且由于统一的标准,揭示了现有方法因大小不准确而遗漏的潜在 GI。软件和补充材料可在 http://bioinfo.zstu.edu.cn/MTGI 或 https://github.com/bioinfo0706/MTGIpick 上免费获得。