Waack Stephan, Keller Oliver, Asper Roman, Brodag Thomas, Damm Carsten, Fricke Wolfgang Florian, Surovcik Katharina, Meinicke Peter, Merkl Rainer
Institut für Informatik, Universität Göttingen, Lotzestr, 16-18, 37083 Göttingen, Germany.
BMC Bioinformatics. 2006 Mar 16;7:142. doi: 10.1186/1471-2105-7-142.
Horizontal gene transfer (HGT) is considered a strong evolutionary force shaping the content of microbial genomes in a substantial manner. It is the difference in speed enabling the rapid adaptation to changing environmental demands that distinguishes HGT from gene genesis, duplications or mutations. For a precise characterization, algorithms are needed that identify transfer events with high reliability. Frequently, the transferred pieces of DNA have a considerable length, comprise several genes and are called genomic islands (GIs) or more specifically pathogenicity or symbiotic islands.
We have implemented the program SIGI-HMM that predicts GIs and the putative donor of each individual alien gene. It is based on the analysis of codon usage (CU) of each individual gene of a genome under study. CU of each gene is compared against a carefully selected set of CU tables representing microbial donors or highly expressed genes. Multiple tests are used to identify putatively alien genes, to predict putative donors and to mask putatively highly expressed genes. Thus, we determine the states and emission probabilities of an inhomogeneous hidden Markov model working on gene level. For the transition probabilities, we draw upon classical test theory with the intention of integrating a sensitivity controller in a consistent manner. SIGI-HMM was written in JAVA and is publicly available. It accepts as input any file created according to the EMBL-format.It generates output in the common GFF format readable for genome browsers. Benchmark tests showed that the output of SIGI-HMM is in agreement with known findings. Its predictions were both consistent with annotated GIs and with predictions generated by different methods.
SIGI-HMM is a sensitive tool for the identification of GIs in microbial genomes. It allows to interactively analyze genomes in detail and to generate or to test hypotheses about the origin of acquired genes.
水平基因转移(HGT)被认为是一种强大的进化力量,在很大程度上塑造了微生物基因组的内容。HGT与基因起源、复制或突变的区别在于其速度差异,这种速度差异使其能够快速适应不断变化的环境需求。为了进行精确表征,需要能够高度可靠地识别转移事件的算法。通常,转移的DNA片段长度可观,包含多个基因,被称为基因组岛(GIs),更具体地说是致病岛或共生岛。
我们实现了SIGI-HMM程序,该程序可预测基因组岛以及每个外来基因的推定供体。它基于对所研究基因组中每个基因的密码子使用情况(CU)的分析。将每个基因的CU与精心挑选的一组代表微生物供体或高表达基因的CU表进行比较。使用多种测试来识别推定的外来基因、预测推定的供体并掩盖推定的高表达基因。因此,我们确定了在基因水平上工作的非齐次隐马尔可夫模型的状态和发射概率。对于转移概率,我们借鉴经典测试理论,旨在以一致的方式集成一个灵敏度控制器。SIGI-HMM用JAVA编写并可公开获取。它接受任何根据EMBL格式创建的文件作为输入。它以基因组浏览器可读的通用GFF格式生成输出。基准测试表明,SIGI-HMM的输出与已知结果一致。其预测既与注释的基因组岛一致,也与不同方法生成的预测一致。
SIGI-HMM是一种用于识别微生物基因组中基因组岛的灵敏工具。它允许交互式地详细分析基因组,并生成或检验关于获得基因起源的假设。