Zhang Haiping, Lin Xiao, Wei Yanjie, Zhang Huiling, Liao Linbu, Wu Hao, Pan Yi, Wu Xuli
Center for High Performance Computing, Joint Engineering Research Center for Health Big Data Intelligent Analysis Technology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China.
School of Medicine, Shenzhen University, Shenzhen, China.
Front Mol Biosci. 2022 Jun 1;9:872086. doi: 10.3389/fmolb.2022.872086. eCollection 2022.
Computational methods with affordable computational resources are highly desirable for identifying active drug leads from millions of compounds. This requires a model that is both highly efficient and relatively accurate, which cannot be achieved by most of the current methods. In real virtual screening (VS) application scenarios, the desired method should perform much better in selecting active compounds by prediction than by random chance. Here, we systematically evaluate the performance of our previously developed DFCNN model in large-scale virtual screening, and the results show our method has approximately 22 times the success rate compared to the random chance on average with a score cutoff of 0.99. Of the 102 test cases, 10 cases have more than 98 times the success rate of a random guess. Interestingly, in three cases, the prediction success rate is 99 times that of a random guess by a score cutoff of 0.99. This indicates that in most situations after our extremely large-scale VS, the dataset can be reduced 20 to 100 times for the next step of virtual screening based on docking or MD simulation. Furthermore, we have employed an experimental method to verify our computational method by finding several activity inhibitors for Trypsin I Protease. In addition, we also show its proof-of-concept application in drug screening. The results indicate the massive potential of this method in the first step of the real drug development workflow. Moreover, DFCNN only takes about 0.0000225s for one protein-compound prediction on average with 80 Intel CPU cores (2.00 GHz) and 60 GB RAM, which is at least tens of thousands of times faster than AutoDock Vina or Schrödinger high-throughput virtual screening. Additionally, an online webserver based on DFCNN for large-scale screening is available at http://cbblab.siat.ac.cn/DFCNN/index.php for the convenience of the users.
对于从数百万种化合物中识别出有活性的药物先导物而言,具备可承受计算资源的计算方法是非常理想的。这需要一个既高效又相对准确的模型,而这是目前大多数方法无法实现的。在实际的虚拟筛选(VS)应用场景中,理想的方法在通过预测选择活性化合物方面应比随机选择表现得好得多。在此,我们系统地评估了我们之前开发的DFCNN模型在大规模虚拟筛选中的性能,结果表明,在分数截止值为0.99时,我们的方法平均成功率约为随机选择的22倍。在102个测试案例中,有10个案例的成功率比随机猜测高98倍以上。有趣的是,在三个案例中,分数截止值为0.99时预测成功率是随机猜测的99倍。这表明在我们进行超大规模VS后的大多数情况下,对于基于对接或分子动力学(MD)模拟的下一步虚拟筛选,数据集可以减少20到100倍。此外,我们采用了一种实验方法,通过找到几种胰蛋白酶I蛋白酶的活性抑制剂来验证我们的计算方法。此外,我们还展示了其在药物筛选中的概念验证应用。结果表明该方法在实际药物开发流程的第一步中具有巨大潜力。而且,在配备80个英特尔CPU核心(2.00 GHz)和60 GB内存的情况下,DFCNN平均预测一个蛋白质 - 化合物对仅需约0.0000225秒,这比AutoDock Vina或薛定谔高通量虚拟筛选至少快数万倍。此外,为方便用户,基于DFCNN的大规模筛选在线网络服务器可在http://cbblab.siat.ac.cn/DFCNN/index.php获取。