Cavendish Laboratory, University of Cambridge, Cambridge CB3 0HE, United Kingdom;
Medicine Design, Pfizer Inc., Cambridge, MA 02139.
Proc Natl Acad Sci U S A. 2019 Feb 26;116(9):3373-3378. doi: 10.1073/pnas.1810847116. Epub 2019 Feb 11.
Predicting ligand biological activity is a key challenge in drug discovery. Ligand-based statistical approaches are often hampered by noise due to undersampling: The number of molecules known to be active or inactive is vastly less than the number of possible chemical features that might determine binding. We derive a statistical framework inspired by random matrix theory and combine the framework with high-quality negative data to discover important chemical differences between active and inactive molecules by disentangling undersampling noise. Our model outperforms standard benchmarks when tested against a set of challenging retrospective tests. We prospectively apply our model to the human muscarinic acetylcholine receptor M1, finding four experimentally confirmed agonists that are chemically dissimilar to all known ligands. The hit rate of our model is significantly higher than the state of the art. Our model can be interpreted and visualized to offer chemical insights about the molecular motifs that are synergistic or antagonistic to M1 agonism, which we have prospectively experimentally verified.
预测配体的生物活性是药物发现中的一个关键挑战。基于配体的统计方法通常受到采样不足的噪声的阻碍:已知具有活性或无活性的分子数量远远少于可能决定结合的可能化学特征的数量。我们从随机矩阵理论中得到了一个统计框架,并将该框架与高质量的负数据相结合,通过分离采样不足的噪声来发现活性和非活性分子之间的重要化学差异。我们的模型在一系列具有挑战性的回顾性测试中表现优于标准基准。我们前瞻性地将我们的模型应用于人类毒蕈碱乙酰胆碱受体 M1,发现了四种实验证实的与所有已知配体在化学上不同的激动剂。我们模型的命中率明显高于现有技术水平。我们的模型可以进行解释和可视化,提供关于与 M1 激动作用协同或拮抗的分子基序的化学见解,我们已经前瞻性地进行了实验验证。