Roche Pharma Research and Early Development, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd, Grenzacherstrasse 124, 4070 Basel, Switzerland.
Department of Medicinal Chemistry, University of Utah, Salt Lake City, Utah 84112, United States.
Chem Rev. 2024 Nov 27;124(22):12551-12572. doi: 10.1021/acs.chemrev.4c00284. Epub 2024 Nov 7.
DNA-encoded library (DEL) technology is a powerful platform for the efficient identification of novel chemical matter in the early drug discovery process enabled by parallel screening of vast libraries of encoded small molecules through affinity selection and deep sequencing. While DEL selections provide rich data sets for computational drug discovery, the underlying technical factors influencing DEL data remain incompletely understood. This review systematically examines the key parameters affecting the chemical information in DEL data and their impact on hit triaging and machine learning integration. The need for rigorous data handling and interpretation is emphasized, with standardized methods being critical for the success of DEL-based approaches. Major challenges include the relationship between sequence counts and binding affinities, frequent hitters, and the influence of factors such as inhomogeneous library composition, DNA damage, and linkers on binding modes. Experimental artifacts, such as those caused by protein immobilization and screening matrix effects, further complicate data interpretation. Recent advancements in using machine learning to denoise DEL data and predict drug candidates are highlighted. This review offers practical guidance on adopting best practices for integrating robust methodologies, comprehensive data analysis, and computational tools to improve the accuracy and efficacy of DEL-driven hit discovery.
DNA 编码文库 (DEL) 技术是一种强大的平台,可通过对编码小分子的庞大文库进行平行筛选,通过亲和选择和深度测序来实现早期药物发现过程中新型化学物质的高效鉴定。虽然 DEL 选择为计算药物发现提供了丰富的数据集,但影响 DEL 数据的基础技术因素仍不完全了解。本综述系统地检查了影响 DEL 数据中化学信息的关键参数及其对命中分类和机器学习集成的影响。强调需要严格的数据处理和解释,标准化方法对于基于 DEL 的方法的成功至关重要。主要挑战包括序列计数与结合亲和力、频繁命中者之间的关系,以及文库组成不均匀、DNA 损伤和接头对结合模式的影响等因素。实验假象,如蛋白质固定化和筛选基质效应引起的假象,进一步使数据解释复杂化。还强调了使用机器学习来去除 DEL 数据中的噪声并预测药物候选物的最新进展。本综述提供了关于采用最佳实践来整合稳健方法、全面数据分析和计算工具以提高 DEL 驱动的命中发现的准确性和效果的实用指导。