Universität Hamburg, Faculty of Mathematics, Informatics and Natural Sciences, Department of Informatics, Center for Bioinformatics, 20146, Hamburg, Germany.
University of Bergen, Department of Chemistry, N-5020, Bergen, Norway.
Mol Inform. 2020 Apr;39(4):e1900103. doi: 10.1002/minf.201900103. Epub 2019 Nov 8.
Protein flexibility and solvation pose major challenges to docking algorithms and scoring functions. One established strategy for addressing these challenges is to use multiple protein conformations for docking (all-against-all ensemble docking). Recent studies have shown that the performance of ensemble docking can be improved by selecting the most relevant protein structures for docking. In search for a robust approach to protein structure selection, we have come up with an integrated mAchine Learning AnD DockINg approach (ALADDIN). ALADDIN employs a battery of random forest classifiers to select, individually for each compound of interest, from an ensemble of protein structures, the single most suitable protein structure for docking. ALADDIN outperformed the best single-structure docking runs, ensemble docking and a similarity-based docking approach on three out of four investigated targets, with up to 0.15, 0.11 and 0.16 higher area under the receiver operating characteristic curve (AUC) values, respectively. Only in the case of cytochrome P450 3A4, ALADDIN, like any of the other tested approaches, failed to obtain decent performance. ALADDIN can be particularly useful for structure-based virtual screening of malleable proteins, including kinases, some viral enzymes and anti-targets.
蛋白质的柔性和溶剂化作用对对接算法和评分函数构成了重大挑战。解决这些挑战的一种既定策略是使用多种蛋白质构象进行对接(所有对所有的集合对接)。最近的研究表明,通过选择最相关的蛋白质结构进行对接,可以提高集合对接的性能。在寻找一种稳健的蛋白质结构选择方法时,我们提出了一种集成的机器学习和对接方法(ALADDIN)。ALADDIN 采用了一系列随机森林分类器,为每个感兴趣的化合物从蛋白质结构的集合中,单独选择最适合对接的单个蛋白质结构。在四个研究目标中的三个目标上,ALADDIN 在三个方面的表现优于最佳单结构对接运行、集合对接和基于相似性的对接方法,分别有高达 0.15、0.11 和 0.16 的更高接收者操作特征曲线(AUC)值。只有在细胞色素 P450 3A4 的情况下,ALADDIN 与其他测试方法一样,无法获得良好的性能。ALADDIN 对于包括激酶、一些病毒酶和抗靶标在内的可变形蛋白质的基于结构的虚拟筛选可能特别有用。