School of Computing, National University of Singapore, Computing 1, 13 Computing Drive, Singapore 117417, Republic of Singapore.
BMC Bioinformatics. 2010 Jan 18;11 Suppl 1(Suppl 1):S63. doi: 10.1186/1471-2105-11-S1-S63.
Conserved gene clusters are groups of genes that are located close to one another in the genomes of several species. They tend to code for proteins that have a functional interaction. The identification of conserved gene clusters is an important step towards understanding genome evolution and predicting gene function.
In this paper, we propose a novel pairwise gene cluster model that combines the notion of bidirectional best hits with the r-window model introduced in 2003 by Durand and Sankoff. The bidirectional best hit (BBH) constraint removes the need to specify the minimum number of shared genes in the r-window model and improves the relevance of the results. We design a subquadratic time algorithm to compute the set of BBH r-window gene clusters efficiently.
We apply our cluster model to the comparative analysis of E. coli K-12 and B. subtilis and perform an extensive comparison between our new model and the gene teams model developed by Bergeron et al. As compared to the gene teams model, our new cluster model has a slightly lower recall but a higher precision at all levels of recall when the results were ranked using statistical tests. An analysis of the most significant BBH r-window gene cluster show that they correspond to known operons.
保守基因簇是指在多个物种的基因组中彼此靠近的一组基因。它们倾向于编码具有功能相互作用的蛋白质。鉴定保守基因簇是理解基因组进化和预测基因功能的重要步骤。
在本文中,我们提出了一种新的两两基因簇模型,该模型将双向最佳命中的概念与 Durand 和 Sankoff 在 2003 年引入的 r 窗口模型结合在一起。双向最佳命中(BBH)约束消除了在 r 窗口模型中指定共享基因的最小数量的需求,并提高了结果的相关性。我们设计了一种亚二次时间算法来有效地计算 BBH r 窗口基因簇集。
我们将我们的聚类模型应用于大肠杆菌 K-12 和枯草芽孢杆菌的比较分析,并在使用统计测试对结果进行排名时,在所有召回率水平上,对我们的新模型和 Bergeron 等人开发的基因团队模型进行了广泛比较。与基因团队模型相比,我们的新聚类模型在所有召回率水平上的召回率略低,但精度更高。对最显著的 BBH r 窗口基因簇的分析表明,它们对应于已知的操纵子。