Finta Sára, Kalikadien Adarsh V, Pidko Evgeny A
Inorganic Systems Engineering, Department of Chemical Engineering, Faculty of Applied Sciences, Delft University of Technology, Van der Maasweg 9, 2629 HZ Delft, The Netherlands.
J Chem Theory Comput. 2025 May 27;21(10):5334-5345. doi: 10.1021/acs.jctc.5c00303. Epub 2025 May 9.
Transition-metal complexes serve as highly enantioselective homogeneous catalysts for various transformations, making them valuable in the pharmaceutical industry. Data-driven prediction models can accelerate high-throughput catalyst design but require computer-readable representations that account for conformational flexibility. This is typically achieved through high-level conformer searches, followed by DFT optimization of the transition-metal complexes. However, conformer selection remains reliant on human assumptions, with no cost-efficient and generalizable workflow available. To address this, we introduce an automated approach to correlate CREST(GFN2-xTB//GFN-FF)-generated conformer ensembles with their DFT-optimized counterparts for systematic conformer selection. We analyzed 24 precatalyst structures, performing CREST conformer searches, followed by full DFT optimization. Three filtering methods were evaluated: (i) geometric ligand descriptors, (ii) PCA-based selection, and (iii) DBSCAN clustering using RMSD and energy. The proposed methods were validated on Rh-based catalysts featuring bisphosphine ligands, which are widely employed in hydrogenation reactions. To assess general applicability, both the precatalyst and its corresponding acrylate-bound complex were analyzed. Our results confirm that CREST overestimates ligand flexibility, and energy-based filtering is ineffective. PCA-based selection failed to distinguish conformers by DFT energy, while RMSD-based filtering improved selection but lacked tunability. DBSCAN clustering provided the most effective approach, eliminating redundancies while preserving key configurations. This method remained robust across data sets and is computationally efficient without requiring molecular descriptor calculations. These findings highlight the limitations of energy-based filtering and the advantages of structure-based approaches for conformer selection. While DBSCAN clustering is a practical solution, its parameters remain system-dependent. For high-accuracy applications, refined energy calculations may be necessary; however, DBSCAN-based clustering offers a computationally accessible strategy for rapid catalyst representations involving conformational flexibility.
过渡金属配合物是各种转化反应的高对映选择性均相催化剂,在制药工业中具有重要价值。数据驱动的预测模型可以加速高通量催化剂设计,但需要能够考虑构象灵活性的计算机可读表示形式。这通常通过高水平的构象搜索来实现,随后对过渡金属配合物进行密度泛函理论(DFT)优化。然而,构象选择仍然依赖于人为假设,目前还没有经济高效且通用的工作流程。为了解决这个问题,我们引入了一种自动化方法,将CREST(GFN2-xTB//GFN-FF)生成的构象集合与其DFT优化后的对应物相关联,以进行系统的构象选择。我们分析了24种前催化剂结构,先进行CREST构象搜索,然后进行全DFT优化。评估了三种筛选方法:(i)几何配体描述符,(ii)基于主成分分析(PCA)的选择,以及(iii)使用均方根偏差(RMSD)和能量的DBSCAN聚类。所提出的方法在具有双膦配体的铑基催化剂上得到了验证,这些催化剂广泛应用于氢化反应。为了评估一般适用性,对前催化剂及其相应的丙烯酸酯结合配合物都进行了分析。我们的结果证实,CREST高估了配体的灵活性,基于能量的筛选无效。基于PCA的选择未能通过DFT能量区分构象,而基于RMSD的筛选改善了选择,但缺乏可调性。DBSCAN聚类提供了最有效的方法,消除了冗余同时保留了关键构型。该方法在不同数据集上都保持稳健,并且计算效率高,无需计算分子描述符。这些发现突出了基于能量筛选的局限性以及基于结构的构象选择方法的优势。虽然DBSCAN聚类是一种实用的解决方案,但其参数仍然依赖于系统。对于高精度应用,可能需要进行精细的能量计算;然而,基于DBSCAN的聚类为涉及构象灵活性的快速催化剂表示提供了一种计算上可行的策略。