Asahara Ryosuke, Miyao Tomoyuki
Graduate School of Science and Technology, Nara Institute of Science and Technology, 8916-5 Takayama-cho, Ikoma, Nara 630-0192, Japan.
Data Science Center, Nara Institute of Science and Technology, 8916-5 Takayama-cho, Ikoma, Nara 630-0192, Japan.
ACS Omega. 2022 Jul 25;7(30):26952-26964. doi: 10.1021/acsomega.2c03812. eCollection 2022 Aug 2.
Predicting the outcomes of organic reactions using data-driven approaches aids in the acceleration of research. In laboratory-scale experiments, only a small number of reaction data can be accessed for machine learning model construction, where reaction representations play a pivotal role in the success of model construction. Nevertheless, representation comparison for a small data set is not adequate. Herein, focusing on the enantioselectivity of phosphoric-acid-catalyzed reactions, various two-dimensional and three-dimensional reaction representations (descriptors) were compared. Overall, the concatenated form of the extended connectivity fingerprints showed the best predictive capability for the two types of data sets: high-throughput experimental data and manually collected literature data sets. Furthermore, highlighting the substructure contribution to the prediction outcome was shown to be informative for guiding catalyst development.
使用数据驱动方法预测有机反应的结果有助于加速研究。在实验室规模的实验中,用于机器学习模型构建的反应数据数量有限,其中反应表示在模型构建的成功中起着关键作用。然而,对小数据集进行表示比较是不够的。在此,针对磷酸催化反应的对映选择性,比较了各种二维和三维反应表示(描述符)。总体而言,扩展连接指纹的串联形式对两种类型的数据集(高通量实验数据和手动收集的文献数据集)显示出最佳的预测能力。此外,突出子结构对预测结果的贡献被证明有助于指导催化剂开发。