Suppr超能文献

利用化学语言模型将分子核心、取代基及组合转化为结构多样的化合物。

Transforming molecular cores, substituents, and combinations into structurally diverse compounds using chemical language models.

作者信息

Piazza Lisa, Srinivasan Sanjana, Tuccinardi Tiziano, Bajorath Jürgen

机构信息

Department of Life Science Informatics and Data Science, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Friedrich-Hirzebruch-Allee 5/6, D-53115, Bonn, Germany; Department of Pharmacy, University of Pisa, Via Bonanno 6, 56126, Pisa, Italy.

Department of Life Science Informatics and Data Science, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Friedrich-Hirzebruch-Allee 5/6, D-53115, Bonn, Germany; Lamarr Institute for Machine Learning and Artificial Intelligence, Rheinische Friedrich-Wilhelms-Universität Bonn, Friedrich-Hirzebruch-Allee 5/6, D-53115, Bonn, Germany.

出版信息

Eur J Med Chem. 2025 Jul 5;291:117615. doi: 10.1016/j.ejmech.2025.117615. Epub 2025 Apr 10.

Abstract

Transformer-based chemical language models (CLMs) were derived to generate structurally and topologically diverse embeddings of core structure fragments, substituents, or core/substituent combinations in chemically proper compounds, representing a design task that is difficult to address using conventional structure generation methods. To this end, CLM variants were challenged to learn different fragment-to-compound mappings in the absence of structural rules or any other fragment linking or synthetic information. The resulting alternative models were found to have high syntactic fidelity, but displayed notable differences in their ability to generate valid candidate compounds containing test fragments, with a clear preference for a model variant processing core/substituent combinations. However, the majority of valid candidate compounds generated with all models were distinct from training data and structurally novel. In addition, the CLMs exhibited high chemical diversification capacity and often generated structures with new topologies not encountered during training. Furthermore, all models produced large numbers of close structural analogues of known bioactive compounds covering a large target space, thus indicating the relevance of newly generated candidates for pharmaceutical research. As a part of our study, the new methodology and all data are made publicly available.

摘要

基于Transformer的化学语言模型(CLM)旨在生成化学性质合适的化合物中核心结构片段、取代基或核心/取代基组合的结构和拓扑多样的嵌入,这代表了一项使用传统结构生成方法难以解决的设计任务。为此,CLM变体面临在没有结构规则或任何其他片段连接或合成信息的情况下学习不同的片段到化合物映射的挑战。结果发现,生成的替代模型具有较高的句法保真度,但在生成包含测试片段的有效候选化合物的能力方面表现出显著差异,明显偏向于处理核心/取代基组合的模型变体。然而,所有模型生成的大多数有效候选化合物都与训练数据不同且结构新颖。此外,CLM表现出高化学多样化能力,并且经常生成在训练期间未遇到的具有新拓扑结构的结构。此外,所有模型都产生了大量已知生物活性化合物的紧密结构类似物,覆盖了很大的目标空间,从而表明新生成的候选物与药物研究的相关性。作为我们研究的一部分,新方法和所有数据都已公开。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验