Singh Sukriti, Hernández-Lobato José Miguel
Department of Engineering, University of Cambridge, Cambridge, UK.
Nat Commun. 2025 Apr 16;16(1):3599. doi: 10.1038/s41467-025-58854-8.
Transition metal-catalyzed asymmetric reactions are of high contemporary importance in organic synthesis. Recently, machine learning (ML) has shown promise in accelerating the development of newer catalytic protocols. However, the need for large amount of experimental data can present a bottleneck for implementing ML models. Here, we propose a meta-learning workflow that can harness the literature-derived data to extract shared reaction features and requires only a few examples to predict the outcome of new reactions. Prototypical networks are used as a meta-learning method to predict the enantioselectivity of asymmetric hydrogenation of olefins. This meta-learning model consistently provides significant performance improvement over other popular ML methods such as random forests and graph neural networks. The performance of our meta-model is analyzed with varying sizes of training examples to demonstrate its utility even with limited data. A good model performance on an out-of-sample test set further indicates the general applicability of our approach. We believe this work will provide a leap forward in identifying promising reactions in the early phases of reaction development when minimal data is available.
过渡金属催化的不对称反应在当代有机合成中具有高度重要性。最近,机器学习(ML)在加速新型催化方案的开发方面显示出了前景。然而,对大量实验数据的需求可能成为实施ML模型的一个瓶颈。在此,我们提出了一种元学习工作流程,它可以利用从文献中获取的数据来提取共享的反应特征,并且只需要几个示例就能预测新反应的结果。原型网络被用作一种元学习方法来预测烯烃不对称氢化的对映选择性。这种元学习模型始终比其他流行的ML方法(如随机森林和图神经网络)在性能上有显著提升。我们通过改变训练示例的大小来分析元模型的性能,以证明即使数据有限它也有用。在样本外测试集上的良好模型性能进一步表明了我们方法的普遍适用性。我们相信这项工作将在反应开发的早期阶段,在可用数据最少的情况下识别有前景的反应方面取得飞跃。