A.M. Butlerov Institute of Chemistry, Kazan Federal University, 420008 Kazan, Russia.
Institute of Molecular and Translational Medicine, Faculty of Medicine and Dentistry, Palacky University and University Hospital in Olomouc, 77900 Olomouc, Czech Republic.
Molecules. 2020 Jan 17;25(2):385. doi: 10.3390/molecules25020385.
Pharmacophore modeling is usually considered as a special type of virtual screening without probabilistic nature. Correspondence of at least one conformation of a molecule to pharmacophore is considered as evidence of its bioactivity. We show that pharmacophores can be treated as one-class machine learning models, and the probability the reflecting model's confidence can be assigned to a pharmacophore on the basis of their precision of active compounds identification on a calibration set. Two schemes (Max and Mean) of probability calculation for consensus prediction based on individual pharmacophore models were proposed. Both approaches to some extent correspond to commonly used consensus approaches like the common hit approach or the one based on a logical OR operation uniting hit lists of individual models. Unlike some known approaches, the proposed ones can rank compounds retrieved by multiple models. These approaches were benchmarked on multiple ChEMBL datasets used for ligand-based pharmacophore modeling and externally validated on corresponding DUD-E datasets. The influence of complexity of pharmacophores and their performance on a calibration set on results of virtual screening was analyzed. It was shown that Max and Mean approaches have superior early enrichment to the commonly used approaches. Thus, a well-performing, easy-to-implement, and probabilistic alternative to existing approaches for pharmacophore-based virtual screening was proposed.
药效团模型通常被认为是一种没有概率性质的特殊类型的虚拟筛选。分子的至少一种构象与药效团对应被认为是其生物活性的证据。我们表明,药效团可以被视为一类机器学习模型,并且可以根据它们在校准集上识别活性化合物的精度,为药效团分配反映模型置信度的概率。基于单个药效团模型,提出了两种基于共识预测的概率计算方案(Max 和 Mean)。这两种方法在某种程度上都对应于常用的共识方法,如常见命中方法或基于逻辑 OR 操作联合单个模型命中列表的方法。与一些已知的方法不同,所提出的方法可以对多个模型检索到的化合物进行排序。这些方法在用于基于配体的药效团建模的多个 ChEMBL 数据集上进行了基准测试,并在相应的 DUD-E 数据集上进行了外部验证。分析了药效团的复杂性及其在校准集上的性能对虚拟筛选结果的影响。结果表明,Max 和 Mean 方法在早期富集方面优于常用方法。因此,提出了一种用于基于药效团的虚拟筛选的性能良好、易于实现且具有概率性的替代方法。