Yoshimori Atsushi, Bajorath Jürgen
Institute for Theoretical Medicine, Inc., 26-1 Muraoka-Higashi 2-chome, Fujisawa, Kanagawa, 251-0012, Japan.
Department of Life Science Informatics and Data Science, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, University of Bonn, Friedrich-Hirzebruch-Allee 5/6, 53115, Bonn, Germany.
J Cheminform. 2025 Jan 15;17(1):5. doi: 10.1186/s13321-025-00951-3.
Analogue series (AS) are generated during compound optimization in medicinal chemistry and are the major source of structure-activity relationship (SAR) information. Pairs of active AS consisting of compounds with corresponding substituents and comparable potency progression represent SAR transfer events for the same target or across different targets. We report a new computational approach to systematically search for SAR transfer series that combines an AS alignment algorithm with context-depending similarity assessment based on vector embeddings adapted from natural language processing. The methodology comprehensively accounts for substituent similarity, identifies non-classical bioisosteres, captures substituent-property relationships, and generates accurate AS alignments. Context-dependent similarity assessment is conceptually novel in computational medicinal chemistry and should also be of interest for other applications.Scientific contributionA method is reported to systematically search for and align analogue series with SAR transfer potential. Central to the approach is the assessment of context-dependent similarity for substituents, a new concept in cheminformatics, which is based upon vector embeddings and word pair relationships adapted from natural language processing.
类似物系列(AS)是在药物化学的化合物优化过程中产生的,是构效关系(SAR)信息的主要来源。由具有相应取代基和可比活性进展的化合物组成的活性AS对代表了针对同一靶点或不同靶点的SAR转移事件。我们报告了一种新的计算方法,用于系统地搜索SAR转移系列,该方法将AS比对算法与基于从自然语言处理改编的向量嵌入的上下文相关相似性评估相结合。该方法全面考虑了取代基相似性,识别了非经典生物电子等排体,捕捉了取代基-性质关系,并生成了准确的AS比对。上下文相关相似性评估在计算药物化学中在概念上是新颖的,并且对于其他应用也应该是有意义的。
科学贡献
报告了一种系统地搜索和比对具有SAR转移潜力的类似物系列的方法。该方法的核心是评估取代基的上下文相关相似性,这是化学信息学中的一个新概念,它基于从自然语言处理改编的向量嵌入和词对关系。