Center for the Study of Systems Biology, School of Biological Sciences , Georgia Institute of Technology , 950 Atlantic Drive, NW , Atlanta , Georgia 30332-2000 , United States.
J Chem Inf Model. 2018 Nov 26;58(11):2343-2354. doi: 10.1021/acs.jcim.8b00309. Epub 2018 Oct 16.
Computational approaches for predicting protein-ligand interactions can facilitate drug lead discovery and drug target determination. We have previously developed a threading/structural-based approach, FINDSITE, for the virtual ligand screening of proteins that has been extensively experimentally validated. Even when low resolution predicted protein structures are employed, FINDSITE has the advantage of being faster and more accurate than traditional high-resolution structure-based docking methods. It also overcomes the limitations of traditional QSAR methods that require a known set of seed ligands that bind to the given protein target. Here, we further improve FINDSITE by enhancing its template ligand selection from the PDB/DrugBank/ChEMBL libraries of known protein-ligand interactions by (1) parsing the template proteins and their corresponding binding ligands in the DrugBank and ChEMBL libraries into domains so that the ligands with falsely matched domains to the targets will not be selected as template ligands; (2) applying various thresholds to filter out falsely matched template structures in the structure comparison process and thus their corresponding ligands for template ligand selection. With a sequence identity cutoff of 30% of target to templates and modeled target structures, FINDSITE is shown to significantly improve upon FINDSITE on the DUD-E benchmark set by increasing the 1% enrichment factor from 16.7 to 22.1, with a p-value of 4.3 × 10 by the Student t-test. With an 80% sequence identity cutoff of target to templates for the DUD-E set and modeled target structures, FINDSITE, having a 1% ROC enrichment factor of 52.39, also outperforms state-of-the-art methods that employ machine learning such as a deep convolutional neural network, CNN, with an enrichment of 29.65. Thus, FINDSITE represents a significant improvement in the state-of-the-art. The FINDSITE web service is freely available for academic users at http://pwp.gatech.edu/cssb/FINDSITE-COMB-2 .
计算方法预测蛋白质-配体相互作用可以促进药物先导化合物的发现和药物靶标的确定。我们之前开发了一种基于穿线/结构的方法 FINDSITE,用于虚拟筛选蛋白质的配体,该方法已经得到了广泛的实验验证。即使使用低分辨率预测的蛋白质结构,FINDSITE 也具有比传统高分辨率结构对接方法更快、更准确的优势。它还克服了传统 QSAR 方法的局限性,传统 QSAR 方法需要一组已知的与给定蛋白质靶标结合的种子配体。在这里,我们通过以下两种方法进一步改进 FINDSITE:(1)将 DrugBank 和 ChEMBL 库中的已知蛋白质-配体相互作用的模板配体从 PDB/DrugBank/ChEMBL 库中选择出来,将模板蛋白及其相应的结合配体解析为结构域,从而避免选择与靶标不匹配的结构域的配体作为模板配体;(2)在结构比较过程中应用各种阈值来过滤掉错误匹配的模板结构,从而为模板配体选择过滤掉相应的配体。对于 DUD-E 基准数据集,当目标与模板的序列同一性截止值为 30%,且目标结构为模型结构时,与 FINDSITE 相比,FINDSITE 的 1%富集因子从 16.7 显著提高到 22.1,p 值为 4.3×10,通过学生 t 检验。对于 DUD-E 数据集和模型结构,当目标与模板的序列同一性截止值为 80%时,FINDSITE 的 1%ROC 富集因子为 52.39,也优于使用机器学习(如深度卷积神经网络 CNN)的最先进方法,其富集因子为 29.65。因此,FINDSITE 是一种显著的技术进步。学术用户可免费在 http://pwp.gatech.edu/cssb/FINDSITE-COMB-2 上访问 FINDSITE 网络服务。