复杂网络理论在候选蛋白靶标鉴定和评估中的应用。

Complex network theory for the identification and assessment of candidate protein targets.

机构信息

Faculty of Health Sciences and Well-being, University of Sunderland, City Campus, Sunderland, SR1 3SD, UK.

Faculty of Computer Science, University of Sunderland, St Peters Campus, Sunderland, SR6 ODD, UK.

出版信息

Comput Biol Med. 2018 Jun 1;97:113-123. doi: 10.1016/j.compbiomed.2018.04.015. Epub 2018 Apr 26.

DOI:10.1016/j.compbiomed.2018.04.015

PMID:29715596

Abstract

In this work we use complex network theory to provide a statistical model of the connectivity patterns of human proteins and their interaction partners. Our intention is to identify important proteins that may be predisposed to be potential candidates as drug targets for therapeutic interventions. Target proteins usually have more interaction partners than non-target proteins, but there are no hard-and-fast rules for defining the actual number of interactions. We devise a statistical measure for identifying hub proteins, we score our target proteins with gene ontology annotations. The important druggable protein targets are likely to have similar biological functions that can be assessed for their potential therapeutic value. Our system provides a statistical analysis of the local and distant neighborhood protein interactions of the potential targets using complex network measures. This approach builds a more accurate model of drug-to-target activity and therefore the likely impact on treating diseases. We integrate high quality protein interaction data from the HINT database and disease associated proteins from the DrugTarget database. Other sources include biological knowledge from Gene Ontology and drug information from DrugBank. The problem is a very challenging one since the data is highly imbalanced between target proteins and the more numerous nontargets. We use undersampling on the training data and build Random Forest classifier models which are used to identify previously unclassified target proteins. We validate and corroborate these findings from the available literature.

摘要

在这项工作中，我们使用复杂网络理论为人类蛋白质及其相互作用伙伴的连接模式提供了一个统计模型。我们的目的是识别可能成为治疗干预潜在候选药物靶点的重要蛋白质。靶蛋白通常比非靶蛋白具有更多的相互作用伙伴，但没有明确的规则来定义实际的相互作用数量。我们设计了一种用于识别枢纽蛋白的统计度量方法，并使用基因本体注释对我们的靶蛋白进行评分。有潜力的可成药蛋白靶标可能具有相似的生物学功能，可以评估其潜在的治疗价值。我们的系统使用复杂网络度量对潜在靶标的局部和远程邻域蛋白质相互作用进行了统计分析。这种方法构建了一个更准确的药物靶点活性模型，因此可能对治疗疾病产生影响。我们整合了来自 HINT 数据库的高质量蛋白质相互作用数据和来自 DrugTarget 数据库的疾病相关蛋白质。其他来源包括来自基因本体的生物学知识和来自 DrugBank 的药物信息。由于数据在靶蛋白和更多数量的非靶蛋白之间高度不平衡，因此这是一个极具挑战性的问题。我们在训练数据上使用欠采样，并构建随机森林分类器模型，用于识别以前未分类的靶蛋白。我们从现有文献中验证和证实了这些发现。