Suppr超能文献

用于鉴定水平获得性DNA的内插可变顺序基序:重新审视沙门氏菌致病岛

Interpolated variable order motifs for identification of horizontally acquired DNA: revisiting the Salmonella pathogenicity islands.

作者信息

Vernikos Georgios S, Parkhill Julian

机构信息

The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus Hinxton, Cambridge CB10 1SA, UK.

出版信息

Bioinformatics. 2006 Sep 15;22(18):2196-203. doi: 10.1093/bioinformatics/btl369. Epub 2006 Jul 12.

Abstract

MOTIVATION

There is a growing literature on the detection of Horizontal Gene Transfer (HGT) events by means of parametric, non-comparative methods. Such approaches rely only on sequence information and utilize different low and high order indices to capture compositional deviation from the genome backbone; the superiority of the latter over the former has been shown elsewhere. However even high order k-mers may be poor estimators of HGT, when insufficient information is available, e.g. in short sliding windows. Most of the current HGT prediction methods require pre-existing annotation, which may restrict their application on newly sequenced genomes.

RESULTS

We introduce a novel computational method, Interpolated Variable Order Motifs (IVOMs), which exploits compositional biases using variable order motif distributions and captures more reliably the local composition of a sequence compared with fixed-order methods. For optimal localization of the boundaries of each predicted region, a second order, two-state hidden Markov model (HMM) is implemented in a change-point detection framework. We applied the IVOM approach to the genome of Salmonella enterica serovar Typhi CT18, a well-studied prokaryote in terms of HGT events, and we show that the IVOMs outperform state-of-the-art low and high order motif methods predicting not only the already characterized Salmonella Pathogenicity Islands (SPI-1 to SPI-10) but also three novel SPIs (SPI-15, SPI-16, SPI-17) and other HGT events.

AVAILABILITY

The software is available under a GPL license as a standalone application at http://www.sanger.ac.uk/Software/analysis/alien_hunter

CONTACT

gsv@sanger.ac.uk

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

关于通过参数化的非比较方法检测水平基因转移(HGT)事件的文献越来越多。此类方法仅依赖序列信息,并利用不同的低阶和高阶指标来捕捉与基因组主干的组成偏差;后者相对于前者的优越性已在其他地方得到证明。然而,当可用信息不足时,例如在短滑动窗口中,即使高阶k-mer也可能是HGT的不良估计器。当前大多数HGT预测方法都需要预先存在的注释,这可能会限制它们在新测序基因组上的应用。

结果

我们引入了一种新的计算方法,即插值可变阶基序(IVOMs),该方法利用可变阶基序分布来利用组成偏差,并且与固定阶方法相比,能更可靠地捕捉序列的局部组成。为了对每个预测区域的边界进行最佳定位,在变点检测框架中实现了二阶双状态隐马尔可夫模型(HMM)。我们将IVOM方法应用于肠炎沙门氏菌血清型伤寒CT18的基因组,这是一种在HGT事件方面经过充分研究的原核生物,我们表明IVOMs优于现有最先进的低阶和高阶基序方法,不仅能预测已鉴定的沙门氏菌致病岛(SPI-1至SPI-10),还能预测三个新的SPI(SPI-15、SPI-16、SPI-17)以及其他HGT事件。

可用性

该软件以GPL许可作为独立应用程序可从http://www.sanger.ac.uk/Software/analysis/alien_hunter获得。

联系方式

gsv@sanger.ac.uk

补充信息

补充数据可在《生物信息学》在线获取。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验