Wang Yong-Cui, Chen Shi-Long, Deng Nai-Yang, Wang Yong
Key Laboratory of Adaptation and Evolution of Plateau Biota, Northwest Institute of Plateau Biology, Chinese Academy of Sciences, Xining 810001, China.
College of Science, China Agricultural University, Beijing 100083, China and.
Bioinformatics. 2016 Jan 15;32(2):226-34. doi: 10.1093/bioinformatics/btv528. Epub 2015 Sep 28.
With the booming of interactome studies, a lot of interactions can be measured in a high throughput way and large scale datasets are available. It is becoming apparent that many different types of interactions can be potential drug targets. Compared with inhibition of a single protein, inhibition of protein-protein interaction (PPI) is promising to improve the specificity with fewer adverse side-effects. Also it greatly broadens the drug target search space, which makes the drug target discovery difficult. Computational methods are highly desired to efficiently provide candidates for further experiments and hold the promise to greatly accelerate the discovery of novel drug targets.
Here, we propose a machine learning method to predict PPI targets in a genomic-wide scale. Specifically, we develop a computational method, named as PrePPItar, to Predict PPIs as drug targets by uncovering the potential associations between drugs and PPIs. First, we survey the databases and manually construct a gold-standard positive dataset for drug and PPI interactions. This effort leads to a dataset with 227 associations among 63 PPIs and 113 FDA-approved drugs and allows us to build models to learn the association rules from the data. Second, we characterize drugs by profiling in chemical structure, drug ATC-code annotation, and side-effect space and represent PPI similarity by a symmetrical S-kernel based on protein amino acid sequence. Then the drugs and PPIs are correlated by Kronecker product kernel. Finally, a support vector machine (SVM), is trained to predict novel associations between drugs and PPIs. We validate our PrePPItar method on the well-established gold-standard dataset by cross-validation. We find that all chemical structure, drug ATC-code, and side-effect information are predictive for PPI target. Moreover, we can increase the PPI target prediction coverage by integrating multiple data sources. Follow-up database search and pathway analysis indicate that our new predictions are worthy of future experimental validation.
In conclusion, PrePPItar can serve as a useful tool for PPI target discovery and provides a general heterogeneous data integrative framework.
PrePPItar is available at http://doc.aporc.org/wiki/PrePPItar.
ycwang@nwipb.cas.cn or ywang@amss.ac.cn
Supplementary data are available at Bioinformatics online.
随着相互作用组研究的蓬勃发展,许多相互作用能够以高通量方式进行测量,大规模数据集也已可得。越来越明显的是,许多不同类型的相互作用都可能成为潜在的药物靶点。与抑制单一蛋白质相比,抑制蛋白质 - 蛋白质相互作用(PPI)有望提高特异性并减少不良副作用。此外,它极大地拓宽了药物靶点搜索空间,这使得药物靶点发现变得困难。因此,迫切需要计算方法来高效地为进一步实验提供候选靶点,并有望极大地加速新型药物靶点的发现。
在此,我们提出一种机器学习方法,用于在全基因组范围内预测PPI靶点。具体而言,我们开发了一种名为PrePPItar的计算方法,通过揭示药物与PPI之间的潜在关联来预测作为药物靶点的PPI。首先,我们调研数据库并手动构建一个用于药物与PPI相互作用的金标准阳性数据集。这一工作得到了一个包含63个PPI与113种FDA批准药物之间227个关联的数据集,使我们能够构建模型从数据中学习关联规则。其次,我们通过化学结构分析、药物ATC编码注释和副作用空间对药物进行特征描述,并基于蛋白质氨基酸序列用对称的S核来表示PPI相似性。然后通过克罗内克积核将药物和PPI关联起来。最后,训练支持向量机(SVM)来预测药物与PPI之间的新关联。我们通过交叉验证在成熟的金标准数据集上验证了我们的PrePPItar方法。我们发现所有化学结构、药物ATC编码和副作用信息对PPI靶点都具有预测性。此外,通过整合多个数据源,我们可以提高PPI靶点预测的覆盖率。后续的数据库搜索和通路分析表明,我们的新预测值得未来进行实验验证。
总之,PrePPItar可作为PPI靶点发现的有用工具,并提供了一个通用的异构数据整合框架。
PrePPItar可在http://doc.aporc.org/wiki/PrePPItar获取。
ycwang@nwipb.cas.cn或ywang@amss.ac.cn
补充数据可在《生物信息学》在线获取。