School of Business and Engineering Vaud (HEIG-VD), University of Applied Sciences Western Switzerland (HES-SO), Route. de Cheseaux 1, Yverdon-Les-Bains, 1400, Switzerland.
SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland.
BMC Bioinformatics. 2018 Nov 20;19(Suppl 14):420. doi: 10.1186/s12859-018-2388-7.
Antibiotic resistance and its rapid dissemination around the world threaten the efficacy of currently-used medical treatments and call for novel, innovative approaches to manage multi-drug resistant infections. Phage therapy, i.e., the use of viruses (phages) to specifically infect and kill bacteria during their life cycle, is one of the most promising alternatives to antibiotics. It is based on the correct matching between a target pathogenic bacteria and the therapeutic phage. Nevertheless, correctly matching them is a major challenge. Currently, there is no systematic method to efficiently predict whether phage-bacterium interactions exist and these pairs must be empirically tested in laboratory. Herein, we present our approach for developing a computational model able to predict whether a given phage-bacterium pair can interact based on their genome.
Based on public data from GenBank and phagesDB.org, we collected more than a thousand positive phage-bacterium interactions with their complete genomes. In addition, we generated putative negative (i.e., non-interacting) pairs. We extracted, from the collected genomes, a set of informative features based on the distribution of predictive protein-protein interactions and on their primary structure (e.g. amino-acid frequency, molecular weight and chemical composition of each protein). With these features, we generated multiple candidate datasets to train our algorithms. On this base, we built predictive models exhibiting predictive performance of around 90% in terms of F1-score, sensitivity, specificity, and accuracy, obtained on the test set with 10-fold cross-validation.
These promising results reinforce the hypothesis that machine learning techniques may produce highly-predictive models accelerating the search of interacting phage-bacteria pairs.
抗生素耐药性及其在全球范围内的快速传播威胁到目前使用的医疗治疗方法的疗效,因此需要寻找新的、创新的方法来治疗多重耐药感染。噬菌体疗法,即利用病毒(噬菌体)在细菌生命周期内特异性感染和杀死细菌,是抗生素的最有前途的替代品之一。它基于目标致病菌和治疗性噬菌体之间的正确匹配。然而,正确匹配它们是一个主要挑战。目前,没有一种系统的方法可以有效地预测噬菌体-细菌相互作用是否存在,这些配对必须在实验室中进行经验性测试。在此,我们介绍了一种开发计算模型的方法,该模型能够根据其基因组预测给定的噬菌体-细菌对是否能够相互作用。
基于 GenBank 和 phagesDB.org 上的公共数据,我们收集了一千多个具有完整基因组的阳性噬菌体-细菌相互作用。此外,我们还生成了假定的阴性(即非相互作用)对。我们从收集的基因组中提取了一组基于预测蛋白-蛋白相互作用分布和它们的一级结构(例如,每个蛋白的氨基酸频率、分子量和化学组成)的信息特征。利用这些特征,我们生成了多个候选数据集来训练我们的算法。在此基础上,我们构建了预测模型,在 10 倍交叉验证的测试集上,其 F1 分数、敏感性、特异性和准确性的预测性能约为 90%。
这些有希望的结果强化了这样一种假设,即机器学习技术可以产生高度预测性的模型,从而加速寻找相互作用的噬菌体-细菌对。