School of Chemistry and Chemical Engineering, Guangdong Pharmaceutical University, Guangzhou 510006, People's Republic of China.
School of Chemistry and Chemical Engineering, Guangdong Pharmaceutical University, Guangzhou 510006, People's Republic of China.
Anal Chim Acta. 2015 Apr 29;871:18-27. doi: 10.1016/j.aca.2015.02.032. Epub 2015 Feb 12.
Identifying potential drug target proteins is a crucial step in the process of drug discovery and plays a key role in the study of the molecular mechanisms of disease. Based on the fact that the majority of proteins exert their functions through interacting with each other, we propose a method to recognize target proteins by using the human protein-protein interaction network and graph theory. In the network, vertexes and edges are weighted by using the confidence scores of interactions and descriptors of protein primary structure, respectively. The novel network topological features are defined and employed to characterize protein using existing databases. A widely used minimum redundancy maximum relevance and random forests algorithm are utilized to select the optimal feature subset and construct model for the identification of potential drug target proteins at the proteome scale. The accuracies of training set and test set are 89.55% and 85.23%. Using the constructed model, 2127 potential drug target proteins have been recognized and 156 drug target proteins have been validated in the database of drug target. In addition, some new drug target proteins can be considered as targets for treating diseases of mucopolysaccharidosis, non-arteritic anterior ischemic optic neuropathy, Bernard-Soulier syndrome and pseudo-von Willebrand, etc. It is anticipated that the proposed method may became a powerful high-throughput virtual screening tool of drug target.
鉴定潜在的药物靶标蛋白是药物发现过程中的关键步骤,在疾病的分子机制研究中起着关键作用。基于大多数蛋白质通过相互作用发挥功能这一事实,我们提出了一种利用人类蛋白质-蛋白质相互作用网络和图论来识别靶标蛋白的方法。在网络中,顶点和边分别使用相互作用的置信度评分和蛋白质一级结构描述符进行加权。定义了新的网络拓扑特征,并利用现有数据库对蛋白质进行特征描述。利用广泛使用的最小冗余最大相关性和随机森林算法,选择最佳特征子集并构建用于鉴定蛋白质组水平潜在药物靶标蛋白的模型。训练集和测试集的准确率分别为 89.55%和 85.23%。利用构建的模型,识别出 2127 个潜在的药物靶标蛋白,并在药物靶标数据库中验证了 156 个药物靶标蛋白。此外,一些新的药物靶标蛋白可以被认为是治疗黏多糖贮积症、非动脉炎性前部缺血性视神经病变、伯纳德-苏利耶综合征和假性血管性血友病等疾病的靶点。预计该方法可能成为一种强大的高通量虚拟药物靶标筛选工具。