Krasoulis Agamemnon, Antonopoulos Nick, Pitsikalis Vassilis, Theodorakis Stavros
DeepLab, Leoforos Syngrou 106, Athens117 41, Greece.
J Chem Inf Model. 2022 Oct 10;62(19):4642-4659. doi: 10.1021/acs.jcim.2c01057. Epub 2022 Sep 26.
Computational methods for virtual screening can dramatically accelerate early-stage drug discovery by identifying potential hits for a specified target. Docking algorithms traditionally use physics-based simulations to address this challenge by estimating the binding orientation of a query protein-ligand pair and a corresponding binding affinity score. Over the recent years, classical and modern machine learning architectures have shown potential for outperforming traditional docking algorithms. Nevertheless, most learning-based algorithms still rely on the availability of the protein-ligand complex binding pose, typically estimated via docking simulations, which leads to a severe slowdown of the overall virtual screening process. A family of algorithms processing target information at the amino acid sequence level avoid this requirement, however, at the cost of processing protein data at a higher representation level. We introduce deep neural virtual screening (DENVIS), an end-to-end pipeline for virtual screening using graph neural networks (GNNs). By performing experiments on two benchmark databases, we show that our method performs competitively to several docking-based, machine learning-based, and hybrid docking/machine learning-based algorithms. By avoiding the intermediate docking step, DENVIS exhibits several orders of magnitude faster screening times (i.e., higher throughput) than both docking-based and hybrid models. When compared to an amino acid sequence-based machine learning model with comparable screening times, DENVIS achieves dramatically better performance. Some key elements of our approach include protein pocket modeling using a combination of atomic and surface features, the use of model ensembles, and data augmentation via artificial negative sampling during model training. In summary, DENVIS achieves competitive to state-of-the-art virtual screening performance, while offering the potential to scale to billions of molecules using minimal computational resources.
虚拟筛选的计算方法可以通过识别特定靶点的潜在命中物,显著加速早期药物发现。对接算法传统上使用基于物理的模拟来应对这一挑战,通过估计查询蛋白-配体对的结合方向和相应的结合亲和力得分。近年来,经典和现代机器学习架构已显示出优于传统对接算法的潜力。然而,大多数基于学习的算法仍然依赖于蛋白-配体复合物结合姿态的可用性,通常通过对接模拟来估计,这导致整体虚拟筛选过程严重放缓。一类在氨基酸序列水平处理靶点信息的算法避免了这一要求,然而,代价是在更高的表示水平上处理蛋白质数据。我们引入了深度神经虚拟筛选(DENVIS),这是一种使用图神经网络(GNN)进行虚拟筛选的端到端流程。通过在两个基准数据库上进行实验,我们表明我们的方法与几种基于对接、基于机器学习以及对接/机器学习混合的算法相比具有竞争力。通过避免中间对接步骤,DENVIS的筛选时间比基于对接和混合模型快几个数量级(即更高的通量)。与具有可比筛选时间的基于氨基酸序列的机器学习模型相比,DENVIS的性能显著更好。我们方法的一些关键要素包括使用原子和表面特征组合进行蛋白口袋建模、使用模型集成以及在模型训练期间通过人工负采样进行数据增强。总之,DENVIS实现了与最先进的虚拟筛选性能相竞争,同时有可能使用最少的计算资源扩展到数十亿个分子。