Department of Chemistry, University of Florida, Gainesville, Florida 32611, United States.
Department of Medicinal Chemistry, University of Florida, Gainesville, Florida 32610, United States.
J Chem Inf Model. 2022 May 9;62(9):2186-2201. doi: 10.1021/acs.jcim.1c01013. Epub 2021 Nov 1.
The quantification of chemical diversity has many applications in drug discovery, organic chemistry, food, and natural product chemistry, to name a few. As the size of the chemical space is expanding rapidly, it is imperative to develop efficient methods to quantify the diversity of large and ultralarge chemical libraries and visualize their mutual relationships in chemical space. Herein, we show an application of our recently introduced extended similarity indices to measure the fingerprint-based diversity of 19 chemical libraries typically used in drug discovery and natural products research with over 18 million compounds. Based on this concept, we introduce the Chemical Library Networks (CLNs) as a general and efficient framework to represent visually the chemical space of large chemical libraries providing a global perspective of the relation between the libraries. For the 19 compound libraries explored in this work, it was found that the (extended) Tanimoto index offers the best description of extended similarity in combination with RDKit fingerprints. CLNs are general and can be explored with any structure representation and similarity coefficient for large chemical libraries.
化学多样性的量化在药物发现、有机化学、食品和天然产物化学等领域有许多应用。随着化学空间的规模迅速扩大,开发有效方法来量化大型和超大型化学文库的多样性并在化学空间中可视化它们的相互关系势在必行。在此,我们展示了我们最近引入的扩展相似性指数在测量 19 个通常用于药物发现和天然产物研究的化学文库的基于指纹的多样性方面的应用,这些文库包含超过 1800 万个化合物。基于这一概念,我们引入了化学文库网络 (CLN),作为一个通用且高效的框架来直观地表示大型化学文库的化学空间,提供了库之间关系的全局视角。对于这项工作中探索的 19 个化合物库,发现(扩展)Tanimoto 指数与 RDKit 指纹相结合,提供了对扩展相似性的最佳描述。CLN 是通用的,可以与任何结构表示和相似系数一起用于大型化学文库。