Chen Hengwei, Bajorath Jürgen
Department of Life Science Informatics and Data Science, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, University of Bonn, Friedrich-Hirzebruch-Allee 5/6, D-53115 Bonn, Germany.
Lamarr Institute for Machine Learning and Artificial Intelligence, University of Bonn, Friedrich-Hirzebruch-Allee 5/6, D-53115 Bonn, Germany.
J Chem Inf Model. 2024 Dec 9;64(23):8784-8795. doi: 10.1021/acs.jcim.4c01781. Epub 2024 Nov 15.
In medicinal chemistry, compound optimization relies on the generation of analogue series (AS) for exploring structure-activity relationships (SARs). Potency progression is a critical criterion for advancing AS. During optimization, a key question is which analogues to synthesize next. We introduce a new computational methodology for the extension of AS with potent compounds containing both core structure and substituent modifications at multiple sites, which has been reported for the first time. The approach combines a transformer chemical language model (CLM) with a SAR matrix (SARM) methodology that identifies and organizes structurally related AS. Therefore, the SARM approach was expanded to cover multisite AS. Consensus series extracted from SARMs representing a potency gradient served as input for CLM training to extend test AS with potent analogues. Different model variants were derived and investigated. Both general and fine-tuned models correctly predicted known potent analogues at high positions in probability-based compound rankings and chemically diversified AS through core structure modifications of the generated candidate compounds and substituent replacements at multiple sites.
在药物化学中,化合物优化依赖于生成类似物系列(AS)以探索构效关系(SAR)。效力提升是推进AS的关键标准。在优化过程中,一个关键问题是接下来要合成哪些类似物。我们首次报道了一种新的计算方法,用于扩展AS,该方法涉及具有多个位点核心结构和取代基修饰的强效化合物。该方法将一种变换器化学语言模型(CLM)与一种识别并组织结构相关AS的SAR矩阵(SARM)方法相结合。因此,SARM方法被扩展以涵盖多位点AS。从代表效力梯度的SARMs中提取的共识系列用作CLM训练的输入,以用强效类似物扩展测试AS。推导并研究了不同的模型变体。通用模型和微调模型都能在基于概率的化合物排名中正确预测处于高位的已知强效类似物,并通过对生成的候选化合物进行核心结构修饰和多位点取代基替换来实现化学多样化的AS。