Tastan Oznur, Qi Yanjun, Carbonell Jaime G, Klein-Seetharaman Judith
School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA.
Pac Symp Biocomput. 2009:516-27.
Human immunodeficiency virus-1 (HIV-1) in acquired immune deficiency syndrome (AIDS) relies on human host cell proteins in virtually every aspect of its life cycle. Knowledge of the set of interacting human and viral proteins would greatly contribute to our understanding of the mechanisms of infection and subsequently to the design of new therapeutic approaches. This work is the first attempt to predict the global set of interactions between HIV-1 and human host cellular proteins. We propose a supervised learning framework, where multiple information data sources are utilized, including co-occurrence of functional motifs and their interaction domains and protein classes, gene ontology annotations, posttranslational modifications, tissue distributions and gene expression profiles, topological properties of the human protein in the interaction network and the similarity of HIV-1 proteins to human proteins' known binding partners. We trained and tested a Random Forest (RF) classifier with this extensive feature set. The model's predictions achieved an average Mean Average Precision (MAP) score of 23%. Among the predicted interactions was for example the pair, HIV-1 protein tat and human vitamin D receptor. This interaction had recently been independently validated experimentally. The rank-ordered lists of predicted interacting pairs are a rich source for generating biological hypotheses. Amongst the novel predictions, transcription regulator activity, immune system process and macromolecular complex were the top most significant molecular function, process and cellular compartments, respectively. Supplementary material is available at URL www.cs.cmu.edu/õznur/hiv/hivPPI.html
获得性免疫缺陷综合征(AIDS)中的人类免疫缺陷病毒1型(HIV-1)在其生命周期的几乎每个方面都依赖于人类宿主细胞蛋白。了解相互作用的人类和病毒蛋白组将极大地有助于我们理解感染机制,并进而有助于设计新的治疗方法。这项工作是首次尝试预测HIV-1与人类宿主细胞蛋白之间的全局相互作用组。我们提出了一个监督学习框架,其中利用了多个信息数据源,包括功能基序及其相互作用结构域和蛋白类别的共现、基因本体注释、翻译后修饰、组织分布和基因表达谱、相互作用网络中人类蛋白的拓扑特性以及HIV-1蛋白与人类蛋白已知结合伙伴的相似性。我们使用这个广泛的特征集训练并测试了一个随机森林(RF)分类器。该模型的预测平均平均精度(MAP)得分为23%。例如,预测的相互作用中有HIV-1蛋白tat与人类维生素D受体这一对。这种相互作用最近已通过实验独立验证。预测的相互作用对的排序列表是产生生物学假设的丰富来源。在新的预测中,转录调节活性、免疫系统过程和大分子复合物分别是最重要的分子功能、过程和细胞区室。补充材料可在网址www.cs.cmu.edu/õznur/hiv/hivPPI.html上获取