Lu Bingxin, Leong Hon Wai
1 Department of Computer Science, National University of Singapore, 13 Computing Drive, Singapore 117417, Republic of Singapore.
J Bioinform Comput Biol. 2018 Jun;16(3):1840010. doi: 10.1142/S0219720018400103. Epub 2018 Feb 4.
The accurate detection of genomic islands (GIs) in microbial genomes is important for both evolutionary study and medical research, because GIs may promote genome evolution and contain genes involved in pathogenesis. Various computational methods have been developed to predict GIs over the years. However, most of them cannot make full use of GI-associated features to achieve desirable performance. Additionally, many methods cannot be directly applied to newly sequenced genomes. We develop a new method called GI-Cluster, which provides an effective way to integrate multiple GI-related features via consensus clustering. GI-Cluster does not require training datasets or existing genome annotations, but it can still achieve comparable or better performance than supervised learning methods in comprehensive evaluations. Moreover, GI-Cluster is widely applicable, either to complete and incomplete genomes or to initial GI predictions from other programs. GI-Cluster also provides plots to visualize the distribution of predicted GIs and related features. GI-Cluster is available at https://github.com/icelu/GI_Cluster.
准确检测微生物基因组中的基因组岛(GIs)对于进化研究和医学研究都很重要,因为基因组岛可能促进基因组进化并包含与发病机制相关的基因。多年来已经开发了各种计算方法来预测基因组岛。然而,它们中的大多数不能充分利用与基因组岛相关的特征来实现理想的性能。此外,许多方法不能直接应用于新测序的基因组。我们开发了一种名为GI-Cluster的新方法,它通过一致性聚类提供了一种整合多个与基因组岛相关特征的有效方法。GI-Cluster不需要训练数据集或现有的基因组注释,但在综合评估中仍能实现与监督学习方法相当或更好的性能。此外,GI-Cluster具有广泛的适用性,既适用于完整和不完整的基因组,也适用于其他程序的初始基因组岛预测。GI-Cluster还提供图表以可视化预测的基因组岛和相关特征的分布。可在https://github.com/icelu/GI_Cluster上获取GI-Cluster。