Suppr超能文献

基于结构的虚拟筛选中机器学习评分函数的性能

Performance of machine-learning scoring functions in structure-based virtual screening.

作者信息

Wójcikowski Maciej, Ballester Pedro J, Siedlecki Pawel

机构信息

Institute of Biochemistry and Biophysics PAS, Pawinskiego 5a, 02-106 Warsaw, Poland.

Centre de Recherche en Cancérologie de Marseille (CRCM), Inserm, U1068, Marseille, F-13009, France.

出版信息

Sci Rep. 2017 Apr 25;7:46710. doi: 10.1038/srep46710.

Abstract

Classical scoring functions have reached a plateau in their performance in virtual screening and binding affinity prediction. Recently, machine-learning scoring functions trained on protein-ligand complexes have shown great promise in small tailored studies. They have also raised controversy, specifically concerning model overfitting and applicability to novel targets. Here we provide a new ready-to-use scoring function (RF-Score-VS) trained on 15 426 active and 893 897 inactive molecules docked to a set of 102 targets. We use the full DUD-E data sets along with three docking tools, five classical and three machine-learning scoring functions for model building and performance assessment. Our results show RF-Score-VS can substantially improve virtual screening performance: RF-Score-VS top 1% provides 55.6% hit rate, whereas that of Vina only 16.2% (for smaller percent the difference is even more encouraging: RF-Score-VS top 0.1% achieves 88.6% hit rate for 27.5% using Vina). In addition, RF-Score-VS provides much better prediction of measured binding affinity than Vina (Pearson correlation of 0.56 and -0.18, respectively). Lastly, we test RF-Score-VS on an independent test set from the DEKOIS benchmark and observed comparable results. We provide full data sets to facilitate further research in this area (http://github.com/oddt/rfscorevs) as well as ready-to-use RF-Score-VS (http://github.com/oddt/rfscorevs_binary).

摘要

经典评分函数在虚拟筛选和结合亲和力预测方面的性能已达到瓶颈。最近,基于蛋白质-配体复合物训练的机器学习评分函数在小规模针对性研究中显示出巨大潜力。它们也引发了争议,特别是关于模型过拟合以及对新靶点的适用性。在此,我们提供了一种新的即用型评分函数(RF-Score-VS),该函数基于15426个活性分子和893897个非活性分子与102个靶点对接的数据进行训练。我们使用完整的DUD-E数据集,以及三种对接工具、五种经典评分函数和三种机器学习评分函数来构建模型并评估性能。我们的结果表明,RF-Score-VS能够显著提高虚拟筛选性能:RF-Score-VS排名前1%的命中率为55.6%,而Vina仅为16.2%(对于更小的百分比,差异更为显著:RF-Score-VS排名前0.1%的命中率为88.6%,而Vina为27.5%)。此外,RF-Score-VS对实测结合亲和力的预测比Vina要好得多(皮尔逊相关系数分别为0.56和-0.18)。最后,我们在来自DEKOIS基准的独立测试集上测试了RF-Score-VS,并观察到了类似的结果。我们提供了完整的数据集以促进该领域的进一步研究(http://github.com/oddt/rfscorevs)以及即用型的RF-Score-VS(http://github.com/oddt/rfscorevs_binary)。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7d72/5404222/117027d2880f/srep46710-f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验