通过构建高度可信的负样本改进化合物-蛋白质相互作用预测。

Improving compound-protein interaction prediction by building up highly credible negative samples.

作者信息

Liu Hui, Sun Jianjiang, Guan Jihong, Zheng Jie, Zhou Shuigeng

机构信息

Lab of Information Management, Changzhou University, Jiangsu 213164, China, School of Computer Engineering, Nanyang Technological University, Singapore 639798, Singapore, Shanghai Key Lab of Intelligent Information Processing, and School of Computer Science, Fudan University, Shanghai 200433, China and Department of Computer Science and Technology, Tongji University, Shanghai 201804, China Lab of Information Management, Changzhou University, Jiangsu 213164, China, School of Computer Engineering, Nanyang Technological University, Singapore 639798, Singapore, Shanghai Key Lab of Intelligent Information Processing, and School of Computer Science, Fudan University, Shanghai 200433, China and Department of Computer Science and Technology, Tongji University, Shanghai 201804, China.

出版信息

Bioinformatics. 2015 Jun 15;31(12):i221-9. doi: 10.1093/bioinformatics/btv256.

DOI:10.1093/bioinformatics/btv256

PMID:26072486

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4765858/

Abstract

MOTIVATION

Computational prediction of compound-protein interactions (CPIs) is of great importance for drug design and development, as genome-scale experimental validation of CPIs is not only time-consuming but also prohibitively expensive. With the availability of an increasing number of validated interactions, the performance of computational prediction approaches is severely impended by the lack of reliable negative CPI samples. A systematic method of screening reliable negative sample becomes critical to improving the performance of in silico prediction methods.

RESULTS

This article aims at building up a set of highly credible negative samples of CPIs via an in silico screening method. As most existing computational models assume that similar compounds are likely to interact with similar target proteins and achieve remarkable performance, it is rational to identify potential negative samples based on the converse negative proposition that the proteins dissimilar to every known/predicted target of a compound are not much likely to be targeted by the compound and vice versa. We integrated various resources, including chemical structures, chemical expression profiles and side effects of compounds, amino acid sequences, protein-protein interaction network and functional annotations of proteins, into a systematic screening framework. We first tested the screened negative samples on six classical classifiers, and all these classifiers achieved remarkably higher performance on our negative samples than on randomly generated negative samples for both human and Caenorhabditis elegans. We then verified the negative samples on three existing prediction models, including bipartite local model, Gaussian kernel profile and Bayesian matrix factorization, and found that the performances of these models are also significantly improved on the screened negative samples. Moreover, we validated the screened negative samples on a drug bioactivity dataset. Finally, we derived two sets of new interactions by training an support vector machine classifier on the positive interactions annotated in DrugBank and our screened negative interactions. The screened negative samples and the predicted interactions provide the research community with a useful resource for identifying new drug targets and a helpful supplement to the current curated compound-protein databases.

AVAILABILITY

Supplementary files are available at: http://admis.fudan.edu.cn/negative-cpi/.

摘要

动机

化合物 - 蛋白质相互作用（CPI）的计算预测对于药物设计和开发至关重要，因为对CPI进行全基因组规模的实验验证不仅耗时，而且成本过高。随着越来越多经过验证的相互作用的出现，由于缺乏可靠的负CPI样本，计算预测方法的性能受到严重影响。一种筛选可靠阴性样本的系统方法对于提高计算机预测方法的性能至关重要。

结果

本文旨在通过计算机筛选方法构建一组高度可信的CPI阴性样本。由于大多数现有的计算模型假设相似的化合物可能与相似的靶蛋白相互作用并取得显著性能，基于相反的否定命题来识别潜在的阴性样本是合理的，即与化合物的每个已知/预测靶标不相似的蛋白质不太可能被该化合物靶向，反之亦然。我们将各种资源整合到一个系统的筛选框架中，这些资源包括化合物的化学结构、化学表达谱和副作用、氨基酸序列、蛋白质 - 蛋白质相互作用网络以及蛋白质的功能注释。我们首先在六个经典分类器上测试筛选出的阴性样本，对于人类和秀丽隐杆线虫，所有这些分类器在我们的阴性样本上的性能都明显高于在随机生成的阴性样本上的性能。然后我们在三个现有的预测模型上验证阴性样本，包括二分局部模型、高斯核轮廓和贝叶斯矩阵分解，发现这些模型在筛选出的阴性样本上的性能也有显著提高。此外，我们在一个药物生物活性数据集上验证了筛选出的阴性样本。最后，我们通过在DrugBank中注释的正相互作用和我们筛选出的负相互作用上训练支持向量机分类器，推导出两组新的相互作用。筛选出的阴性样本和预测的相互作用为研究界提供了一个识别新药物靶点的有用资源，以及对当前整理的化合物 - 蛋白质数据库的有益补充。

可用性

补充文件可在以下网址获取：http://admis.fudan.edu.cn/negative-cpi/

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2660/4765858/2971ed4515e8/btv256f1p.jpg

相似文献

Improving compound-protein interaction prediction by building up highly credible negative samples.

Bioinformatics. 2015 Jun 15;31(12):i221-9. doi: 10.1093/bioinformatics/btv256.

Effectively Identifying Compound-Protein Interactions by Learning from Positive and Unlabeled Examples.

IEEE/ACM Trans Comput Biol Bioinform. 2018 Nov-Dec;15(6):1832-1843. doi: 10.1109/TCBB.2016.2570211. Epub 2016 May 18.

BMC Bioinformatics. 2019 Feb 4;19(Suppl 13):554. doi: 10.1186/s12859-018-2563-x.

Boosting compound-protein interaction prediction by deep learning.

Methods. 2016 Nov 1;110:64-72. doi: 10.1016/j.ymeth.2016.06.024. Epub 2016 Jul 1.

Computational probing protein-protein interactions targeting small molecules.

Bioinformatics. 2016 Jan 15;32(2):226-34. doi: 10.1093/bioinformatics/btv528. Epub 2015 Sep 28.

A general prediction model for compound-protein interactions based on deep learning.

Front Pharmacol. 2024 Sep 4;15:1465890. doi: 10.3389/fphar.2024.1465890. eCollection 2024.

Predicting adverse drug reactions of combined medication from heterogeneous pharmacologic databases.

BMC Bioinformatics. 2018 Dec 31;19(Suppl 19):517. doi: 10.1186/s12859-018-2520-8.

Improving Compound-Protein Interaction Prediction by Self-Training with Augmenting Negative Samples.

J Chem Inf Model. 2023 Aug 14;63(15):4552-4559. doi: 10.1021/acs.jcim.3c00269. Epub 2023 Jul 17.

Computationally predicting protein-RNA interactions using only positive and unlabeled examples.

J Bioinform Comput Biol. 2015 Jun;13(3):1541005. doi: 10.1142/S021972001541005X. Epub 2015 Feb 8.

Identification of potential drug-targets by combining evolutionary information extracted from frequency profiles and molecular topological structures.

Chem Biol Drug Des. 2020 Aug;96(2):758-767. doi: 10.1111/cbdd.13599. Epub 2020 May 25.

引用本文的文献

CPI-MIF: Compound-Protein Interaction Prediction with Multiview Information Fusion.

ACS Omega. 2025 Jul 13;10(28):30155-30166. doi: 10.1021/acsomega.5c00113. eCollection 2025 Jul 22.

SaeGraphDTI: drug-target interaction prediction based on sequence attribute extraction and graph neural network.

BMC Bioinformatics. 2025 Jul 15;26(1):177. doi: 10.1186/s12859-025-06195-0.

Top-DTI: integrating topological deep learning and large language models for drug-target interaction prediction.

Bioinformatics. 2025 Jul 1;41(Supplement_1):i133-i141. doi: 10.1093/bioinformatics/btaf183.

Prediction of drug-target interactions based on substructure subsequences and cross-public attention mechanism.

PLoS One. 2025 May 30;20(5):e0324146. doi: 10.1371/journal.pone.0324146. eCollection 2025.

WDGBANDTI: A Deep Graph Convolutional Network-Based Bilinear Attention Network for Drug-Target Interaction Prediction with Domain Adaptation.

Interdiscip Sci. 2025 May 23. doi: 10.1007/s12539-025-00714-6.

Sensing Compound Substructures Combined with Molecular Fingerprinting to Predict Drug-Target Interactions.

Interdiscip Sci. 2025 Apr 3. doi: 10.1007/s12539-025-00698-3.

Brief Bioinform. 2025 Mar 4;26(2). doi: 10.1093/bib/bbaf122.

GraphBAN: An inductive graph-based approach for enhanced prediction of compound-protein interactions.

Nat Commun. 2025 Mar 18;16(1):2541. doi: 10.1038/s41467-025-57536-9.

DTIAM: a unified framework for predicting drug-target interactions, binding affinities and drug mechanisms.

Nat Commun. 2025 Mar 15;16(1):2548. doi: 10.1038/s41467-025-57828-0.

Prediction of drug target interaction based on under sampling strategy and random forest algorithm.

PLoS One. 2025 Mar 6;20(3):e0318420. doi: 10.1371/journal.pone.0318420. eCollection 2025.

本文引用的文献

DINIES: drug-target interaction network inference engine based on supervised analysis.

Nucleic Acids Res. 2014 Jul;42(Web Server issue):W39-45. doi: 10.1093/nar/gku337. Epub 2014 May 16.

Toward more realistic drug-target interaction predictions.

Brief Bioinform. 2015 Mar;16(2):325-37. doi: 10.1093/bib/bbu010. Epub 2014 Apr 9.

Scalable prediction of compound-protein interactions using minwise hashing.

BMC Syst Biol. 2013;7 Suppl 6(Suppl 6):S3. doi: 10.1186/1752-0509-7-S6-S3. Epub 2013 Dec 13.

Mantra 2.0: an online collaborative resource for drug mode of action and repurposing by network analysis.

Bioinformatics. 2014 Jun 15;30(12):1787-8. doi: 10.1093/bioinformatics/btu058. Epub 2014 Feb 20.

Causal Network Models for Predicting Compound Targets and Driving Pathways in Cancer.

J Biomol Screen. 2014 Jun;19(5):791-802. doi: 10.1177/1087057114522690. Epub 2014 Feb 11.

STITCH 4: integration of protein-chemical interactions with user data.

Nucleic Acids Res. 2014 Jan;42(Database issue):D401-7. doi: 10.1093/nar/gkt1207. Epub 2013 Nov 28.

Brief Bioinform. 2014 Sep;15(5):734-47. doi: 10.1093/bib/bbt056. Epub 2013 Aug 11.

Predicting Drug-Target Interactions for New Drug Compounds Using a Weighted Nearest Neighbor Profile.

PLoS One. 2013 Jun 26;8(6):e66952. doi: 10.1371/journal.pone.0066952. Print 2013.

Predicting drug-target interactions using restricted Boltzmann machines.

Bioinformatics. 2013 Jul 1;29(13):i126-34. doi: 10.1093/bioinformatics/btt234.

Drug-target interaction prediction through domain-tuned network-based inference.

Bioinformatics. 2013 Aug 15;29(16):2004-8. doi: 10.1093/bioinformatics/btt307. Epub 2013 May 29.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

通过构建高度可信的负样本改进化合物-蛋白质相互作用预测。

Improving compound-protein interaction prediction by building up highly credible negative samples.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY

动机

结果

可用性

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献