Ha Seung Kyun, Kalyani Dipannita, West Michael S, Xu Jessica, Lam Yu-Hong, Struble Thomas, Dreher Spencer, Krska Shane W, Buchwald Stephen L, Jensen Klavs F
Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States.
Discovery Chemistry, Merck & Co., Inc., Rahway, New Jersey 07065, United States.
J Am Chem Soc. 2025 Jun 11;147(23):19602-19613. doi: 10.1021/jacs.5c00933. Epub 2025 May 29.
This manuscript presents machine learning models for Pd-catalyzed C-N couplings constructed using a large, pharmaceutically relevant, structurally diverse dataset (4204 unique products) generated using high-throughput experimentation. The dataset generation was enabled by the discovery of novel nanomole scale compatible automation friendly C-N coupling reaction conditions using LiOTMS as the base. The large dataset enabled the systematic evaluation of model performance using five different data-splitting strategies that were carefully designed to assess the models' ability to both interpolate and extrapolate. The models exhibit high predictive performance across all splits as gauged by standard metrics. In addition, the models predicted with high accuracy the outcome of validation libraries that were outside the scope of the training set. Employing these models in the context of medicinal chemistry campaigns should result in significant enrichment of successful C-N couplings.
本手稿展示了用于钯催化碳氮偶联反应的机器学习模型,这些模型是利用高通量实验生成的一个大型、与药物相关且结构多样的数据集(4204种独特产物)构建而成的。通过发现以LiOTMS作为碱的新型纳摩尔级兼容且对自动化友好的碳氮偶联反应条件,实现了数据集的生成。这个大型数据集使得能够使用五种不同的数据拆分策略对模型性能进行系统评估,这些策略经过精心设计,以评估模型的内插和外推能力。根据标准指标衡量,这些模型在所有拆分中均表现出较高的预测性能。此外,这些模型还高精度地预测了超出训练集范围的验证库的结果。在药物化学研究中应用这些模型应能显著富集成功的碳氮偶联反应。