Sethi Geetika, Chopra Gaurav, Samudrala Ram
Department of Biomedical Informatics, School of Medicine and Biomedical Sciences, State University of New York (SUNY), 923 Main Street, Buffalo, NY 14203, USA.
Mini Rev Med Chem. 2015;15(8):705-17. doi: 10.2174/1389557515666150219145148.
We have examined the effect of eight different protein classes (channels, GPCRs, kinases, ligases, nuclear receptors, proteases, phosphatases, transporters) on the benchmarking performance of the CANDO drug discovery and repurposing platform (http://protinfo.org/cando). The first version of the CANDO platform utilizes a matrix of predicted interactions between 48278 proteins and 3733 human ingestible compounds (including FDA approved drugs and supplements) that map to 2030 indications/diseases using a hierarchical chem and bio-informatic fragment based docking with dynamics protocol (> one billion predicted interactions considered). The platform uses similarity of compound-proteome interaction signatures as indicative of similar functional behavior and benchmarking accuracy is calculated across 1439 indications/diseases with more than one approved drug. The CANDO platform yields a significant correlation (0.99, p-value < 0.0001) between the number of proteins considered and benchmarking accuracy obtained indicating the importance of multitargeting for drug discovery. Average benchmarking accuracies range from 6.2 % to 7.6 % for the eight classes when the top 10 ranked compounds are considered, in contrast to a range of 5.5 % to 11.7 % obtained for the comparison/control sets consisting of 10, 100, 1000, and 10000 single best performing proteins. These results are generally two orders of magnitude better than the average accuracy of 0.2% obtained when randomly generated (fully scrambled) matrices are used. Different indications perform well when different classes are used but the best accuracies (up to 11.7% for the top 10 ranked compounds) are achieved when a combination of classes are used containing the broadest distribution of protein folds. Our results illustrate the utility of the CANDO approach and the consideration of different protein classes for devising indication specific protocols for drug repurposing as well as drug discovery.
我们研究了八种不同蛋白质类别(通道蛋白、G蛋白偶联受体、激酶、连接酶、核受体、蛋白酶、磷酸酶、转运蛋白)对CANDO药物发现与重新利用平台(http://protinfo.org/cando)基准测试性能的影响。CANDO平台的第一个版本利用了48278种蛋白质与3733种人类可摄入化合物(包括FDA批准的药物和补充剂)之间预测相互作用的矩阵,这些化合物通过基于分层化学和生物信息片段的对接与动力学协议(考虑了超过十亿种预测相互作用)映射到2030种适应症/疾病。该平台使用化合物-蛋白质组相互作用特征的相似性来指示相似的功能行为,并在1439种有不止一种批准药物的适应症/疾病中计算基准测试准确性。CANDO平台在所考虑的蛋白质数量与获得的基准测试准确性之间产生了显著相关性(0.99,p值<0.0001),表明多靶点对于药物发现的重要性。当考虑排名前10的化合物时,这八种类别的平均基准测试准确性范围为6.2%至7.6%,相比之下,由10、100、1000和10000个单一表现最佳的蛋白质组成的比较/对照组的范围为5.5%至11.7%。这些结果通常比使用随机生成(完全打乱)矩阵时获得的平均准确性0.2%好两个数量级。当使用不同类别时,不同的适应症表现良好,但当使用包含最广泛蛋白质折叠分布的类别组合时,可实现最佳准确性(排名前10的化合物高达11.7%)。我们的结果说明了CANDO方法的实用性以及考虑不同蛋白质类别对于设计用于药物重新利用以及药物发现的适应症特异性方案的意义。