Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Bonn, Germany.
Expert Opin Drug Discov. 2024 Apr;19(4):403-414. doi: 10.1080/17460441.2024.2313475. Epub 2024 Feb 5.
Large chemical spaces (CSs) include traditional large compound collections, combinatorial libraries covering billions to trillions of molecules, DNA-encoded chemical libraries comprising complete combinatorial CSs in a single mixture, and virtual CSs explored by generative models. The diverse nature of these types of CSs require different chemoinformatic approaches for navigation.
An overview of different types of large CSs is provided. Molecular representations and similarity metrics suitable for large CS exploration are discussed. A summary of navigation of CSs in generative models is provided. Methods for characterizing and comparing CSs are discussed.
The size of large CSs might restrict navigation to specialized algorithms and limit it to considering neighborhoods of structurally similar molecules. Efficient navigation of large CSs not only requires methods that scale with size but also requires smart approaches that focus on better but not necessarily larger molecule selections. Deep generative models aim to provide such approaches by implicitly learning features relevant for targeted biological properties. It is unclear whether these models can fulfill this ideal as validation is difficult as long as the covered CSs remain mainly virtual without experimental verification.
大型化学空间(CS)包括传统的大型化合物库、涵盖数十亿到数万亿分子的组合文库、在单个混合物中包含完整组合 CS 的 DNA 编码化学文库,以及由生成模型探索的虚拟 CS。这些类型的 CS 的多样性需要不同的化学信息学方法来进行导航。
提供了不同类型的大型 CS 的概述。讨论了适合大型 CS 探索的分子表示和相似性度量。提供了对生成模型中 CS 导航的总结。讨论了 CS 特征和比较的方法。
大型 CS 的大小可能会限制导航到专门的算法,并将其限制在考虑结构相似分子的邻域内。大型 CS 的高效导航不仅需要与大小成比例的方法,还需要智能方法,专注于更好但不一定更大的分子选择。深度生成模型旨在通过隐式学习与目标生物特性相关的特征来提供此类方法。这些模型是否能够实现这一理想尚不清楚,因为只要涵盖的 CS 主要仍然是虚拟的,没有实验验证,验证就很困难。