Manelfi Candida, Tazzari Valerio, Lunghini Filippo, Cerchia Carmen, Fava Anna, Pedretti Alessandro, Stouten Pieter F W, Vistoli Giulio, Beccari Andrea Rosario
EXSCALATE, Dompé Farmaceutici SpA, Via Tommaso de Amicis 95, 80123, Napoli, Italy.
Department of Pharmacy, University of Naples "Federico II", Via D. Montesano 49, 80131, Napoli, Italy.
J Cheminform. 2024 Feb 23;16(1):21. doi: 10.1186/s13321-024-00813-4.
The conversion of chemical structures into computer-readable descriptors, able to capture key structural aspects, is of pivotal importance in the field of cheminformatics and computer-aided drug design. Molecular fingerprints represent a widely employed class of descriptors; however, their generation process is time-consuming for large databases and is susceptible to bias. Therefore, descriptors able to accurately detect predefined structural fragments and devoid of lengthy generation procedures would be highly desirable. To meet additional needs, such descriptors should also be interpretable by medicinal chemists, and suitable for indexing databases with trillions of compounds. To this end, we developed-as integral part of EXSCALATE, Dompé's end-to-end drug discovery platform-the DompeKeys (DK), a new substructure-based descriptor set, which encodes the chemical features that characterize compounds of pharmaceutical interest. DK represent an exhaustive collection of curated SMARTS strings, defining chemical features at different levels of complexity, from specific functional groups and structural patterns to simpler pharmacophoric points, corresponding to a network of hierarchically interconnected substructures. Because of their extended and hierarchical structure, DK can be used, with good performance, in different kinds of applications. In particular, we demonstrate how they are very well suited for effective mapping of chemical space, as well as substructure search and virtual screening. Notably, the incorporation of DK yields highly performing machine learning models for the prediction of both compounds' activity and metabolic reaction occurrence. The protocol to generate the DK is freely available at https://dompekeys.exscalate.eu and is fully integrated with the Molecular Anatomy protocol for the generation and analysis of hierarchically interconnected molecular scaffolds and frameworks, thus providing a comprehensive and flexible tool for drug design applications.
将化学结构转化为能够捕捉关键结构特征的计算机可读描述符,在化学信息学和计算机辅助药物设计领域至关重要。分子指纹是一类广泛使用的描述符;然而,对于大型数据库而言,其生成过程耗时且容易产生偏差。因此,能够准确检测预定义结构片段且无需冗长生成过程的描述符将非常受欢迎。为满足其他需求,此类描述符还应能被药物化学家解读,并适用于对数万亿化合物的数据库进行索引。为此,我们开发了DompeKeys(DK)作为Dompé的端到端药物发现平台EXSCALATE的一个组成部分,这是一种基于子结构的新描述符集,它编码了表征具有药物活性化合物的化学特征。DK是一组经过精心策划的详尽SMARTS字符串集合,定义了从特定官能团和结构模式到更简单药效基团点等不同复杂程度的化学特征,对应于一个层次互连子结构的网络。由于其扩展的层次结构,DK可在各种应用中表现良好。特别是,我们展示了它们如何非常适合有效地映射化学空间以及进行子结构搜索和虚拟筛选。值得注意的是,纳入DK可产生用于预测化合物活性和代谢反应发生情况的高性能机器学习模型。生成DK的协议可在https://dompekeys.exscalate.eu免费获取,并且与用于生成和分析层次互连分子支架和框架的分子解剖协议完全集成,从而为药物设计应用提供了一个全面且灵活的工具。