Hamady Micah, Betterton M D, Knight Rob
Department of Chemistry and Biochemistry, University of Colorado, Boulder, CO 80309, USA.
BMC Bioinformatics. 2006 Oct 26;7:476. doi: 10.1186/1471-2105-7-476.
Horizontal gene transfer (HGT) has allowed bacteria to evolve many new capabilities. Because transferred genes perform many medically important functions, such as conferring antibiotic resistance, improved detection of horizontally transferred genes from sequence data would be an important advance. Existing sequence-based methods for detecting HGT focus on changes in nucleotide composition or on differences between gene and genome phylogenies; these methods have high error rates.
First, we introduce a new class of methods for detecting HGT based on the changes in nucleotide substitution rates that occur when a gene is transferred to a new organism. Our new methods discriminate simulated HGT events with an error rate up to 10 times lower than does GC content. Use of models that are not time-reversible is crucial for detecting HGT. Second, we show that using combinations of multiple predictors of HGT offers substantial improvements over using any single predictor, yielding as much as a factor of 18 improvement in performance (a maximum reduction in error rate from 38% to about 3%). Multiple predictors were combined by using the random forests machine learning algorithm to identify optimal classifiers that separate HGT from non-HGT trees.
The new class of HGT-detection methods introduced here combines advantages of phylogenetic and compositional HGT-detection techniques. These new techniques offer order-of-magnitude improvements over compositional methods because they are better able to discriminate HGT from non-HGT trees under a wide range of simulated conditions. We also found that combining multiple measures of HGT is essential for detecting a wide range of HGT events. These novel indicators of horizontal transfer will be widely useful in detecting HGT events linked to the evolution of important bacterial traits, such as antibiotic resistance and pathogenicity.
水平基因转移(HGT)使细菌进化出了许多新能力。由于转移的基因执行许多医学上重要的功能,如赋予抗生素抗性,因此从序列数据中改进对水平转移基因的检测将是一项重要进展。现有的基于序列检测HGT的方法侧重于核苷酸组成的变化或基因与基因组系统发育之间的差异;这些方法具有较高的错误率。
首先,我们引入了一类基于基因转移到新生物体时核苷酸替换率变化来检测HGT的新方法。我们的新方法区分模拟的HGT事件时的错误率比基于GC含量的方法低达10倍。使用不可逆时间模型对于检测HGT至关重要。其次,我们表明,使用多种HGT预测指标的组合比使用任何单一预测指标有显著改进,性能提升高达18倍(错误率从38%最大降低到约3%)。通过使用随机森林机器学习算法组合多个预测指标,以识别将HGT与非HGT树区分开的最佳分类器。
本文介绍的新型HGT检测方法结合了系统发育和组成性HGT检测技术的优点。这些新技术比组成性方法有数量级的改进,因为它们在广泛的模拟条件下更能区分HGT与非HGT树。我们还发现,结合多种HGT测量方法对于检测广泛的HGT事件至关重要。这些水平转移的新指标在检测与重要细菌特性(如抗生素抗性和致病性)进化相关的HGT事件中将具有广泛用途。