Merkl Rainer
Abteilung Molekulare Genetik und Präparative Molekularbiologie, Institut für Mikrobiologie und Genetik, Georg-August-Universität Göttingen and Göttingen Genomics Laboratory, Grisebachstr, 8, 37077 Göttingen, Germany.
BMC Bioinformatics. 2004 Mar 3;5:22. doi: 10.1186/1471-2105-5-22.
Genomic islands can be observed in many microbial genomes. These stretches of DNA have a conspicuous composition with regard to sequence or encoded functions. Genomic islands are assumed to be frequently acquired via horizontal gene transfer. For the analysis of genome structure and the study of horizontal gene transfer, it is necessary to reliably identify and characterize these islands.
A scoring scheme on codon frequencies Score_G1G2(cdn) = log(f_G2(cdn) / f_G1(cdn)) was utilized. To analyse genes of a species G1 and to test their relatedness to species G2, scores were determined by applying the formula to log-odds derived from mean codon frequencies of the two genomes. A non-redundant set of nearly 400 codon usage tables comprising microbial species was derived; its members were used alternatively at position G2. Genes having at least one score value above a species-specific and dynamically determined cut-off value were analysed further. By means of cluster analysis, genes were identified that comprise clusters of statistically significant size. These clusters were predicted as genomic islands. Finally and individually for each of these genes, the taxonomical relation among those species responsible for significant scores was interpreted. The validity of the approach and its limitations were made plausible by an extensive analysis of natural genes and synthetic ones aimed at modelling the process of gene amelioration.
The method reliably allows to identify genomic island and the likely origin of alien genes.
在许多微生物基因组中都能观察到基因组岛。这些DNA片段在序列或编码功能方面具有显著的组成特征。基因组岛被认为经常通过水平基因转移获得。为了分析基因组结构和研究水平基因转移,有必要可靠地识别和表征这些岛屿。
利用了一种基于密码子频率的评分方案Score_G1G2(cdn) = log(f_G2(cdn) / f_G1(cdn))。为了分析物种G1的基因并测试它们与物种G2的相关性,通过将该公式应用于从两个基因组的平均密码子频率得出的对数优势来确定分数。得到了一组包含近400个微生物物种密码子使用表的非冗余集合;其成员在G2位置交替使用。对至少有一个分数值高于物种特异性且动态确定的临界值的基因进行进一步分析。通过聚类分析,识别出包含具有统计学显著规模的簇的基因。这些簇被预测为基因组岛。最后,针对这些基因中的每一个,分别解释了产生显著分数的那些物种之间的分类关系。通过对旨在模拟基因改善过程的天然基因和合成基因的广泛分析,该方法的有效性及其局限性变得合理。
该方法能够可靠地识别基因组岛以及外来基因的可能来源。