Winter Sascha, Jahn Katharina, Wehner Stefanie, Kuchenbecker Leon, Marz Manja, Stoye Jens, Böcker Sebastian
Chair for Bioinformatics, Institute for Computer Science, Friedrich-Schiller-University Jena, Jena, Germany.
Genome Informatics, Faculty of Technology and Center for Biotechnology (CeBiTec), Bielefeld University, Bielefeld, Germany.
Nucleic Acids Res. 2016 Nov 16;44(20):9600-9610. doi: 10.1093/nar/gkw843. Epub 2016 Sep 26.
Gene-order-based comparison of multiple genomes provides signals for functional analysis of genes and the evolutionary process of genome organization. Gene clusters are regions of co-localized genes on genomes of different species. The rapid increase in sequenced genomes necessitates bioinformatics tools for finding gene clusters in hundreds of genomes. Existing tools are often restricted to few (in many cases, only two) genomes, and often make restrictive assumptions such as short perfect conservation, conserved gene order or monophyletic gene clusters. We present Gecko 3, an open-source software for finding gene clusters in hundreds of bacterial genomes, that comes with an easy-to-use graphical user interface. The underlying gene cluster model is intuitive, can cope with low degrees of conservation as well as misannotations and is complemented by a sound statistical evaluation. To evaluate the biological benefit of Gecko 3 and to exemplify our method, we search for gene clusters in a dataset of 678 bacterial genomes using Synechocystis sp. PCC 6803 as a reference. We confirm detected gene clusters reviewing the literature and comparing them to a database of operons; we detect two novel clusters, which were confirmed by publicly available experimental RNA-Seq data. The computational analysis is carried out on a laptop computer in <40 min.
基于基因顺序的多个基因组比较为基因功能分析和基因组组织的进化过程提供了线索。基因簇是不同物种基因组上共定位基因的区域。测序基因组的快速增加需要生物信息学工具来在数百个基因组中寻找基因簇。现有工具通常局限于少数(在许多情况下,仅两个)基因组,并且常常做出诸如短完美保守、保守基因顺序或单系基因簇等限制性假设。我们展示了Gecko 3,这是一款用于在数百个细菌基因组中寻找基因簇的开源软件,它带有易于使用的图形用户界面。其底层的基因簇模型直观,能够应对低保守度以及错误注释,并辅以合理的统计评估。为了评估Gecko 3的生物学益处并举例说明我们的方法,我们以集胞藻属PCC 6803作为参考,在一个包含678个细菌基因组的数据集里寻找基因簇。我们通过查阅文献并将检测到的基因簇与操纵子数据库进行比较来确认它们;我们检测到两个新的簇,它们被公开可用的实验RNA测序数据所证实。在一台笔记本电脑上进行计算分析用时不到40分钟。