Department of Chemistry and Quantum Theory Project, University of Florida, Gainesville, Florida, 32611, United States.
DIFACQUIM Research Group, Department of Pharmacy, National Autonomous University of Mexico, Mexico City, 04510, Mexico.
Mol Inform. 2023 Jul;42(7):e2300056. doi: 10.1002/minf.202300056. Epub 2023 Jun 7.
Understanding structure-activity landscapes is essential in drug discovery. Similarly, it has been shown that the presence of activity cliffs in compound data sets can have a substantial impact not only on the design progress but also can influence the predictive ability of machine learning models. With the continued expansion of the chemical space and the currently available large and ultra-large libraries, it is imperative to implement efficient tools to analyze the activity landscape of compound data sets rapidly. The goal of this study is to show the applicability of the n-ary indices to quantify the structure-activity landscapes of large compound data sets using different types of structural representation rapidly and efficiently. We also discuss how a recently introduced medoid algorithm provides the foundation to finding optimum correlations between similarity measures and structure-activity rankings. The applicability of the n-ary indices and the medoid algorithm is shown by analyzing the activity landscape of 10 compound data sets with pharmaceutical relevance using three fingerprints of different designs, 16 extended similarity indices, and 11 coincidence thresholds.
理解结构-活性景观在药物发现中至关重要。同样,已经表明化合物数据集的活性悬崖的存在不仅对设计进展有重大影响,而且还会影响机器学习模型的预测能力。随着化学空间的不断扩展和目前可用的大型和超大型库,必须实施有效的工具来快速分析化合物数据集的活性景观。本研究的目的是展示使用不同类型的结构表示快速有效地量化大型化合物数据集的结构-活性景观的 n 元指数的适用性。我们还讨论了最近引入的中值算法如何为找到相似性度量和结构-活性排序之间的最佳相关性提供基础。通过使用三种不同设计的指纹、16 个扩展相似性指数和 11 个重合阈值,分析了 10 个具有药物相关性的化合物数据集的活性景观,证明了 n 元指数和中值算法的适用性。