Vernikos Georgios S, Parkhill Julian
The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus Hinxton, Cambridge CB10 1SA, UK.
Bioinformatics. 2006 Sep 15;22(18):2196-203. doi: 10.1093/bioinformatics/btl369. Epub 2006 Jul 12.
There is a growing literature on the detection of Horizontal Gene Transfer (HGT) events by means of parametric, non-comparative methods. Such approaches rely only on sequence information and utilize different low and high order indices to capture compositional deviation from the genome backbone; the superiority of the latter over the former has been shown elsewhere. However even high order k-mers may be poor estimators of HGT, when insufficient information is available, e.g. in short sliding windows. Most of the current HGT prediction methods require pre-existing annotation, which may restrict their application on newly sequenced genomes.
We introduce a novel computational method, Interpolated Variable Order Motifs (IVOMs), which exploits compositional biases using variable order motif distributions and captures more reliably the local composition of a sequence compared with fixed-order methods. For optimal localization of the boundaries of each predicted region, a second order, two-state hidden Markov model (HMM) is implemented in a change-point detection framework. We applied the IVOM approach to the genome of Salmonella enterica serovar Typhi CT18, a well-studied prokaryote in terms of HGT events, and we show that the IVOMs outperform state-of-the-art low and high order motif methods predicting not only the already characterized Salmonella Pathogenicity Islands (SPI-1 to SPI-10) but also three novel SPIs (SPI-15, SPI-16, SPI-17) and other HGT events.
The software is available under a GPL license as a standalone application at http://www.sanger.ac.uk/Software/analysis/alien_hunter
Supplementary data are available at Bioinformatics online.
关于通过参数化的非比较方法检测水平基因转移(HGT)事件的文献越来越多。此类方法仅依赖序列信息,并利用不同的低阶和高阶指标来捕捉与基因组主干的组成偏差;后者相对于前者的优越性已在其他地方得到证明。然而,当可用信息不足时,例如在短滑动窗口中,即使高阶k-mer也可能是HGT的不良估计器。当前大多数HGT预测方法都需要预先存在的注释,这可能会限制它们在新测序基因组上的应用。
我们引入了一种新的计算方法,即插值可变阶基序(IVOMs),该方法利用可变阶基序分布来利用组成偏差,并且与固定阶方法相比,能更可靠地捕捉序列的局部组成。为了对每个预测区域的边界进行最佳定位,在变点检测框架中实现了二阶双状态隐马尔可夫模型(HMM)。我们将IVOM方法应用于肠炎沙门氏菌血清型伤寒CT18的基因组,这是一种在HGT事件方面经过充分研究的原核生物,我们表明IVOMs优于现有最先进的低阶和高阶基序方法,不仅能预测已鉴定的沙门氏菌致病岛(SPI-1至SPI-10),还能预测三个新的SPI(SPI-15、SPI-16、SPI-17)以及其他HGT事件。
该软件以GPL许可作为独立应用程序可从http://www.sanger.ac.uk/Software/analysis/alien_hunter获得。
补充数据可在《生物信息学》在线获取。