ZBH-Center for Bioinformatics, Research Group for Computational Molecular Design, Universität Hamburg, Bundesstraβe 43, Hamburg 20146, Germany.
J Chem Inf Model. 2021 Jan 25;61(1):238-251. doi: 10.1021/acs.jcim.0c00850. Epub 2020 Oct 21.
In similarity-driven virtual screening, molecular fingerprints are widely used to assess the similarity of all compounds contained in a chemical library to a query compound of interest. This similarity analysis is traditionally done for each member of the library individually. When encoding chemical spaces that surpass billions of compounds in size, it becomes impractical to enumerate all their products, let alone assess their similarity, deeming this approach impossible without investing a substantial amount of resources. In this work, we present a novel search algorithm named SpaceLight for topological fingerprint similarity searching in large, practically non-enumerable combinatorial fragment spaces. In contrast to existing methods, SpaceLight is able to utilize the combinatorial character of these chemical spaces for efficiency while maintaining a high correlation of the description of molecular similarity to well-known molecular fingerprints like ECFP. The resulting software is able to search prominent spaces like EnamineREAL with more than 10 billion compounds in seconds on a standard desktop computer.
在基于相似性的虚拟筛选中,分子指纹广泛用于评估化学文库中所有化合物与目标化合物的相似性。这种相似性分析通常是针对库中的每个成员分别进行的。当编码的化学空间大小超过数十亿种化合物时,枚举它们的所有产物并评估其相似性变得不切实际,因此,如果不投入大量资源,这种方法是不可能实现的。在这项工作中,我们提出了一种名为 SpaceLight 的新搜索算法,用于在大型、实际上不可枚举的组合片段空间中进行拓扑指纹相似性搜索。与现有方法相比,SpaceLight 能够利用这些化学空间的组合特性来提高效率,同时保持与 ECFP 等知名分子指纹对分子相似性描述的高度相关性。所得到的软件能够在标准台式计算机上几秒钟内搜索像 EnamineREAL 这样的大型空间,其中包含超过 100 亿种化合物。