Lu Bingxin, Leong Hon Wai
1 Department of Computer Science, National University of Singapore, 13 Computing Drive, Singapore 117417, Republic of Singapore.
J Bioinform Comput Biol. 2016 Feb;14(1):1640003. doi: 10.1142/S0219720016400035.
Genomic islands (GIs) are clusters of functionally related genes acquired by lateral genetic transfer (LGT), and they are present in many bacterial genomes. GIs are extremely important for bacterial research, because they not only promote genome evolution but also contain genes that enhance adaption and enable antibiotic resistance. Many methods have been proposed to predict GI. But most of them rely on either annotations or comparisons with other closely related genomes. Hence these methods cannot be easily applied to new genomes. As the number of newly sequenced bacterial genomes rapidly increases, there is a need for methods to detect GI based solely on sequences of a single genome. In this paper, we propose a novel method, GI-SVM, to predict GIs given only the unannotated genome sequence. GI-SVM is based on one-class support vector machine (SVM), utilizing composition bias in terms of k-mer content. From our evaluations on three real genomes, GI-SVM can achieve higher recall compared with current methods, without much loss of precision. Besides, GI-SVM allows flexible parameter tuning to get optimal results for each genome. In short, GI-SVM provides a more sensitive method for researchers interested in a first-pass detection of GI in newly sequenced genomes.
基因组岛(GIs)是通过横向基因转移(LGT)获得的功能相关基因簇,存在于许多细菌基因组中。基因组岛对细菌研究极为重要,因为它们不仅促进基因组进化,还包含增强适应性和产生抗生素抗性的基因。已经提出了许多预测基因组岛的方法。但其中大多数要么依赖注释,要么与其他密切相关的基因组进行比较。因此,这些方法不易应用于新的基因组。随着新测序细菌基因组数量的迅速增加,需要仅基于单个基因组序列来检测基因组岛的方法。在本文中,我们提出了一种新方法GI-SVM,仅根据未注释的基因组序列来预测基因组岛。GI-SVM基于单类支持向量机(SVM),利用k-mer含量方面的组成偏差。通过对三个真实基因组的评估,与当前方法相比,GI-SVM可以实现更高的召回率,而精度损失不大。此外,GI-SVM允许灵活调整参数,以便为每个基因组获得最佳结果。简而言之,GI-SVM为有兴趣在新测序基因组中首次检测基因组岛的研究人员提供了一种更灵敏的方法。