Virginia Bioinformatics Institute, Virginia Tech, 1 Washington St, Blacksburg, VA 24061, USA.
Infect Genet Evol. 2011 Jul;11(5):917-23. doi: 10.1016/j.meegid.2011.02.022. Epub 2011 Mar 5.
Infectious diseases result in millions of deaths each year. Physical interactions between pathogen and host proteins often form the basis of such infections. While a number of methods have been proposed for predicting protein-protein interactions (PPIs), they have primarily focused on intra-species protein-protein interactions.
We present an application of a supervised learning method for predicting physical interactions between host and pathogen proteins, using the human-HIV system. Using a Support Vector Machine with a linear kernel, we explore the use of a number of features including domain profiles, protein sequence k-mers, and properties of human proteins in a human PPI network. We achieve the best cross-validation performance when we use a combination of all three of these features. At a precision value of 70% we obtain recall values greater than 40%, depending on the ratio of positive examples to negative examples used during training. We use a classifier trained using these features to predict new PPIs between human and HIV proteins. We focus our discussion on those predicted interactions that involve human proteins known to be critical for HIV replication and propagation. Examples of predicted interactions with support in the literature include those necessary for viral attachment to the host membrane and subsequent invasion of the host cell.
Unlike intra-species PPIs, host-pathogen PPIs have not yet been experimentally detected on a large scale, though they are likely to play important roles in pathogenesis and disease outcomes. Computational methods that can robustly and accurately predict host-pathogen PPIs hold the promise of guiding future experiments and gaining insights into potential mechanisms of pathogenesis.
传染病每年导致数百万人死亡。病原体和宿主蛋白之间的物理相互作用通常是此类感染的基础。虽然已经提出了许多预测蛋白质-蛋白质相互作用(PPIs)的方法,但它们主要集中在种内蛋白质-蛋白质相互作用上。
我们提出了一种使用监督学习方法预测宿主和病原体蛋白之间物理相互作用的应用,以人类 - 艾滋病毒系统为例。我们使用带有线性核的支持向量机,探索了使用包括结构域谱、蛋白质序列 k- mers 和人类蛋白质在人类 PPI 网络中的特性在内的许多特征。当我们结合使用这三种特征时,我们获得了最佳的交叉验证性能。在精度值为 70%的情况下,我们获得了大于 40%的召回值,具体取决于训练过程中使用的正例和负例的比例。我们使用使用这些特征训练的分类器来预测人类和 HIV 蛋白之间的新 PPI。我们将讨论重点放在那些涉及到已知对 HIV 复制和传播至关重要的人类蛋白的预测相互作用上。文献中支持的预测相互作用的例子包括那些病毒与宿主膜结合以及随后宿主细胞入侵所必需的相互作用。
与种内 PPIs 不同,宿主 - 病原体 PPIs 尚未在大规模上进行实验检测,但它们可能在发病机制和疾病结果中发挥重要作用。能够稳健且准确地预测宿主 - 病原体 PPIs 的计算方法有望指导未来的实验,并深入了解潜在的发病机制。