Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Endenicher Allee 19c , Rheinische Friedrich-Wilhelms-Universität , D-53115 Bonn , Germany.
J Med Chem. 2018 Nov 21;61(22):10255-10264. doi: 10.1021/acs.jmedchem.8b01404. Epub 2018 Nov 13.
Assay interference compounds give rise to false-positives and cause substantial problems in medicinal chemistry. Nearly 500 compound classes have been designated as pan-assay interference compounds (PAINS), which typically occur as substructures in other molecules. The structural environment of PAINS substructures is likely to play an important role for their potential reactivity. Given the large number of PAINS and their highly variable structural contexts, it is difficult to study context dependence on the basis of expert knowledge. Hence, we applied machine learning to predict PAINS that are promiscuous and distinguish them from others that are mostly inactive. Surprisingly accurate models can be derived using different methods such as support vector machines, random forests, or deep neural networks. Moreover, structural features that favor correct predictions have been identified, mapped, and categorized, shedding light on the structural context dependence of PAINS effects. The machine learning models presented herein further extend the capacity of PAINS filters.
分析干扰化合物会导致假阳性,并给药物化学带来严重问题。近 500 种化合物类别被指定为泛分析干扰化合物 (PAINS),它们通常作为其他分子中的亚结构出现。PAINS 亚结构的结构环境可能对其潜在反应性起着重要作用。鉴于 PAINS 的数量众多且其结构上下文高度可变,基于专家知识很难研究上下文依赖性。因此,我们应用机器学习来预测具有混杂性的 PAINS,并将其与其他大多数无活性的 PAINS 区分开来。使用支持向量机、随机森林或深度神经网络等不同方法可以得出非常准确的模型。此外,还确定、映射和分类了有利于正确预测的结构特征,揭示了 PAINS 效应的结构上下文依赖性。本文提出的机器学习模型进一步扩展了 PAINS 过滤器的能力。