Mastrolorito Fabrizio, Ciriaco Fulvio, Togo Maria Vittoria, Gambacorta Nicola, Trisciuzzi Daniela, Altomare Cosimo Damiano, Amoroso Nicola, Grisoni Francesca, Nicolotti Orazio
Dipartimento di Farmacia-Scienze del Farmaco, Università degli Studi di Bari Aldo Moro, Bari, Italy.
Dipartimento di Chimica, Università degli Studi di Bari Aldo Moro, Bari, Italy.
Commun Chem. 2025 Jan 29;8(1):26. doi: 10.1038/s42004-025-01423-3.
Generative models have revolutionized de novo drug design, allowing to produce molecules on-demand with desired physicochemical and pharmacological properties. String based molecular representations, such as SMILES (Simplified Molecular Input Line Entry System) and SELFIES (Self-Referencing Embedded Strings), have played a pivotal role in the success of generative approaches, thanks to their capacity to encode atom- and bond- information and ease-of-generation. However, such 'atom-level' string representations could have certain limitations, in terms of capturing information on chirality, and synthetic accessibility of the corresponding designs.In this paper, we present fragSMILES, a novel fragment-based molecular representation in the form of string. fragSMILES encode fragments in a 'chemically-meaningful' way via a novel graph-reduction approach, allowing to obtain an efficient, interpretable, and expressive molecular representation, which also avoids fragment redundancy. fragSMILES contributes to the field of fragment-based representation, by reporting fragments and their 'breaking' bonds independently. Moreover, fragSMILES also embeds information of molecular chirality, thereby overcoming known limitations of existing string notations. When compared with SMILES, SELFIES and t-SMILES for de novo design, the fragSMILES notation showed its promise in generating molecules with desirable biochemical and scaffolds properties.
生成模型彻底改变了从头药物设计,能够按需生成具有所需物理化学和药理特性的分子。基于字符串的分子表示法,如SMILES(简化分子输入线性输入系统)和SELFIES(自参考嵌入式字符串),由于其编码原子和键信息的能力以及易于生成,在生成方法的成功中发挥了关键作用。然而,这种“原子级”字符串表示法在捕捉手性信息以及相应设计的合成可及性方面可能存在某些局限性。在本文中,我们提出了fragSMILES,一种新颖的基于片段的字符串形式的分子表示法。fragSMILES通过一种新颖的图约简方法以“化学有意义”的方式编码片段,从而获得一种高效、可解释且富有表现力的分子表示法,同时还避免了片段冗余。fragSMILES通过独立报告片段及其“断裂”键,为基于片段的表示领域做出了贡献。此外,fragSMILES还嵌入了分子手性信息,从而克服了现有字符串表示法的已知局限性。与用于从头设计的SMILES、SELFIES和t-SMILES相比时fragSMILES表示法在生成具有理想生化性质和骨架性质的分子方面显示出了潜力。