Laboratory of Chemoinformatics, Faculty of Chemistry, University of Strasbourg, 4, Blaise Pascal Str., 67081, Strasbourg, France.
Department of Medicinal Chemistry, Boehringer Ingelheim Pharma GmbH & Co. KG, Birkendorferstrasse 65, 88397, Biberach an der Riss, Germany.
J Comput Aided Mol Des. 2019 Mar;33(3):331-343. doi: 10.1007/s10822-019-00188-x. Epub 2019 Feb 9.
The previously reported procedure to generate "universal" Generative Topographic Maps (GTMs) of the drug-like chemical space is in practice a multi-task learning process, in which both operational GTM parameters (example: map grid size) and hyperparameters (key example: the molecular descriptor space to be used) are being chosen by an evolutionary process in order to fit/select "universal" GTM manifolds. After selection (a one-time task aimed at optimizing the compromise in terms of neighborhood behavior compliance, over a large pool of various biological targets), for any further use the manifolds are ready to provide "fit-free" predictive models. Using any structure-activity set-irrespectively whether the associated target served at map fitting stage or not-the generation or "coloring" a property landscape enables predicting the property for any external molecule, with zero additional fitable parameters involved. While previous works have signaled the excellent behavior of such models in aggressive three-fold cross-validation assessments of their predictive power, the present work wished to explore their behavior in Virtual Screening (VS), here simulated on hand of external DUD ligand and decoy series that are fully disjoint from the ChEMBL-extracted landscape coloring sets. Beyond the rather robust results of the universal GTM manifolds in this challenge, it could be shown that the descriptor spaces selected by the evolutionary multi-task learner were intrinsically able to serve as an excellent support for many other VS procedures, starting from parameter-free similarity searching, to local (target-specific) GTM models, to parameter-rich, nonlinear Random Forest and Neural Network approaches.
先前报道的生成药物样化学空间通用生成拓扑图(GTM)的方法实际上是一个多任务学习过程,在此过程中,操作 GTM 参数(例如:图谱网格大小)和超参数(关键示例:要使用的分子描述符空间)都通过进化过程进行选择,以便拟合/选择“通用”GTM 流形。选择后(一次性任务旨在优化在邻域行为一致性方面的折衷,针对各种不同的生物靶标),对于任何进一步的使用,流形都已准备好提供“无拟合”预测模型。使用任何结构活性集——无论相关靶标是否在图谱拟合阶段使用——生成或“着色”属性景观都能够预测任何外部分子的属性,无需涉及任何额外的可拟合参数。虽然之前的工作已经表明,这些模型在其预测能力的激进三重交叉验证评估中的表现非常出色,但本工作希望探索它们在虚拟筛选(VS)中的行为,在此通过与从 ChEMBL 提取的景观着色集完全不相关的外部 DUD 配体和诱饵系列模拟 VS。在通用 GTM 流形在这项挑战中表现出相当稳健的结果之外,还可以表明,进化多任务学习器选择的描述符空间本质上能够作为许多其他 VS 程序的出色支持,从无参数相似性搜索、局部(特定于靶标)GTM 模型、参数丰富的非线性随机森林和神经网络方法开始。