Center for the Study of Systems Biology, School of Biological Sciences, Georgia Institute of Technology, Atlanta, Georgia, USA.
Protein Sci. 2024 Jan;33(1):e4869. doi: 10.1002/pro.4869.
Protein function annotation and drug discovery often involve finding small molecule binders. In the early stages of drug discovery, virtual ligand screening (VLS) is frequently applied to identify possible hits before experimental testing. While our recent ligand homology modeling (LHM)-machine learning VLS method FRAGSITE outperformed approaches that combined traditional docking to generate protein-ligand poses and deep learning scoring functions to rank ligands, a more robust approach that could identify a more diverse set of binding ligands is needed. Here, we describe FRAGSITE2 that shows significant improvement on protein targets lacking known small molecule binders and no confident LHM identified template ligands when benchmarked on two commonly used VLS datasets: For both the DUD-E set and DEKOIS2.0 set and ligands having a Tanimoto coefficient (TC) < 0.7 to the template ligands, the 1% enrichment factor (EF ) of FRAGSITE2 is significantly better than those for FINDSITE , an earlier LHM algorithm. For the DUD-E set, FRAGSITE2 also shows better ROC enrichment factor and AUPR (area under the precision-recall curve) than the deep learning DenseFS scoring function. Comparison with the RF-score-VS on the 76 target subset of DEKOIS2.0 and a TC < 0.99 to training DUD-E ligands, FRAGSITE2 has double the EF . Its boosted tree regression method provides for more robust performance than a deep learning multiple layer perceptron method. When compared with the pretrained language model for protein target features, FRAGSITE2 also shows much better performance. Thus, FRAGSITE2 is a promising approach that can discover novel hits for protein targets. FRAGSITE2's web service is freely available to academic users at http://sites.gatech.edu/cssb/FRAGSITE2.
蛋白质功能注释和药物发现通常涉及寻找小分子配体。在药物发现的早期阶段,虚拟配体筛选(VLS)经常被用于在实验测试之前识别可能的命中。虽然我们最近的配体同源建模(LHM)-机器学习 VLS 方法 FRAGSITE 在性能上优于将传统对接与生成蛋白质-配体构象和深度学习打分函数相结合的方法,但是需要一种更强大的方法来识别更多样化的结合配体。在这里,我们描述了 FRAGSITE2,它在两个常用的 VLS 数据集上进行基准测试时,在缺乏已知小分子配体的蛋白质靶标和没有可信 LHM 鉴定模板配体的情况下,显示出显著的改进:对于 DUD-E 集和 DEKOIS2.0 集,以及与模板配体的拓朴相似系数(TC)<0.7 的配体,FRAGSITE2 的 1%富集因子(EF)显著优于早期的 LHM 算法 FINDSITE。对于 DUD-E 集,FRAGSITE2 还显示出比深度学习 DenseFS 打分函数更好的 ROC 富集因子和 AUPR(精度-召回曲线下的面积)。与 RF-score-VS 在 DEKOIS2.0 的 76 个目标子集和 TC<0.99 到训练 DUD-E 配体的比较,FRAGSITE2 的 EF 是其两倍。其增强树回归方法提供了比深度学习多层感知器方法更稳健的性能。与蛋白质目标特征的预训练语言模型相比,FRAGSITE2 也显示出更好的性能。因此,FRAGSITE2 是一种很有前途的方法,可以为蛋白质靶标发现新的命中。FRAGSITE2 的网络服务可免费向学术用户提供,网址为 http://sites.gatech.edu/cssb/FRAGSITE2。