Dablander Markus, Hanser Thierry, Lambiotte Renaud, Morris Garrett M
Mathematical Institute, University of Oxford, Andrew Wiles Building, Radcliffe Observatory Quarter (550), Woodstock Road, Oxford, OX2 6GG, UK.
Lhasa Limited, Granary Wharf House, 2 Canal Wharf, Leeds, LS11 5PS, UK.
J Cheminform. 2023 Apr 17;15(1):47. doi: 10.1186/s13321-023-00708-w.
Pairs of similar compounds that only differ by a small structural modification but exhibit a large difference in their binding affinity for a given target are known as activity cliffs (ACs). It has been hypothesised that QSAR models struggle to predict ACs and that ACs thus form a major source of prediction error. However, the AC-prediction power of modern QSAR methods and its quantitative relationship to general QSAR-prediction performance is still underexplored. We systematically construct nine distinct QSAR models by combining three molecular representation methods (extended-connectivity fingerprints, physicochemical-descriptor vectors and graph isomorphism networks) with three regression techniques (random forests, k-nearest neighbours and multilayer perceptrons); we then use each resulting model to classify pairs of similar compounds as ACs or non-ACs and to predict the activities of individual molecules in three case studies: dopamine receptor D2, factor Xa, and SARS-CoV-2 main protease.
Our results provide strong support for the hypothesis that indeed QSAR models frequently fail to predict ACs. We observe low AC-sensitivity amongst the evaluated models when the activities of both compounds are unknown, but a substantial increase in AC-sensitivity when the actual activity of one of the compounds is given. Graph isomorphism features are found to be competitive with or superior to classical molecular representations for AC-classification and can thus be employed as baseline AC-prediction models or simple compound-optimisation tools. For general QSAR-prediction, however, extended-connectivity fingerprints still consistently deliver the best performance amongs the tested input representations. A potential future pathway to improve QSAR-modelling performance might be the development of techniques to increase AC-sensitivity.
仅通过微小结构修饰而有所不同,但对给定靶点的结合亲和力却存在巨大差异的相似化合物对,被称为活性断崖(ACs)。据推测,定量构效关系(QSAR)模型难以预测活性断崖,因此活性断崖构成了预测误差的主要来源。然而,现代QSAR方法的活性断崖预测能力及其与一般QSAR预测性能的定量关系仍未得到充分探索。我们通过将三种分子表示方法(扩展连接指纹、物理化学描述符向量和图同构网络)与三种回归技术(随机森林、k近邻和多层感知器)相结合,系统地构建了九个不同的QSAR模型;然后,在三个案例研究中,我们使用每个所得模型将相似化合物对分类为活性断崖或非活性断崖,并预测单个分子的活性:多巴胺受体D2、凝血因子Xa和严重急性呼吸综合征冠状病毒2(SARS-CoV-2)主要蛋白酶。
我们的结果为定量构效关系模型确实经常无法预测活性断崖这一假设提供了有力支持。当两种化合物的活性均未知时,我们在评估模型中观察到较低的活性断崖敏感性,但当给出其中一种化合物的实际活性时,活性断崖敏感性显著增加。发现图同构特征在活性断崖分类方面与经典分子表示具有竞争力或更优,因此可作为基线活性断崖预测模型或简单的化合物优化工具。然而,对于一般的定量构效关系预测,在测试的输入表示中,扩展连接指纹仍然始终如一地表现出最佳性能。提高定量构效关系建模性能的一个潜在未来途径可能是开发提高活性断崖敏感性的技术。