Mastrolorito Fabrizio, Ciriaco Fulvio, Nicolotti Orazio, Grisoni Francesca
Department of Biomedical Engineering, Institute for Complex Molecular Systems (ICMS) & Eindhoven AI Systems Institute (EAISI), Eindhoven University of Technology, Eindhoven, The Netherlands.
Dipartimento di Farmacia-Scienze del Farmaco, Università degli Studi di Bari Aldo Moro, Bari, Italy.
Chem Commun (Camb). 2025 Sep 11. doi: 10.1039/d5cc02641e.
This work focuses on organic reaction prediction with deep learning, with the recently introduced fragSMILES representation - which encodes molecular substructures and chirality, enabling compact and expressive molecular representation in a textual form. In a systematic comparison with well-established molecular notations - simplified molecular input line entry system (SMILES), self-referencing embedded strings (SELFIES), sequential attachment-based fragment embedding (SAFE) and tree-based SMILES (t-SMILES) - fragSMILES achieved the highest performance across forward- and retro-synthesis prediction, with superior recognition of stereochemical reaction information. Moreover, fragSMILES enhances the capacity to capture stereochemical complexity - a key challenge in synthesis planning. Our results demonstrate that chirality-aware and fragment-level representations can advance current computer-assisted synthesis planning efforts.
这项工作专注于利用深度学习进行有机反应预测,采用了最近引入的fragSMILES表示法——它对分子子结构和手性进行编码,能够以文本形式实现紧凑且富有表现力的分子表示。在与成熟的分子表示法——简化分子输入线性输入系统(SMILES)、自引用嵌入字符串(SELFIES)、基于顺序连接的片段嵌入(SAFE)和基于树的SMILES(t-SMILES)——进行的系统比较中,fragSMILES在正向和逆向合成预测方面均取得了最高性能,对立体化学反应信息的识别能力更强。此外,fragSMILES增强了捕捉立体化学复杂性的能力——这是合成规划中的一个关键挑战。我们的结果表明,对手性敏感且基于片段的表示法能够推动当前计算机辅助合成规划工作的发展。