Vogel Gabriel, Weber Jana M
Department of Intelligent Systems, Delft University of Technology Delft 2629 HZ The Netherlands
Chem Sci. 2024 Dec 17;16(3):1161-1178. doi: 10.1039/d4sc05900j. eCollection 2025 Jan 15.
The demand for innovative synthetic polymers with improved properties is high, but their structural complexity and vast design space hinder rapid discovery. Machine learning-guided molecular design is a promising approach to accelerate polymer discovery. However, the scarcity of labeled polymer data and the complex hierarchical structure of synthetic polymers make generative design particularly challenging. We advance the current state-of-the-art approaches to generate not only repeating units, but monomer ensembles including their stoichiometry and chain architecture. We build upon a recent polymer representation that includes stoichiometries and chain architectures of monomer ensembles and develop a novel variational autoencoder (VAE) architecture encoding a graph and decoding a string. Using a semi-supervised setup, we enable the handling of partly labelled datasets which can be beneficial for domains with a small corpus of labelled data. Our model learns a continuous, well organized latent space (LS) that enables generation of copolymer structures including different monomer stoichiometries and chain architectures. In an inverse design case study, we demonstrate our model for discovery of novel conjugated copolymer photocatalysts for hydrogen production using optimization of the polymer's electron affinity and ionization potential in the latent space.
对具有改进性能的新型合成聚合物的需求很高,但其结构复杂性和巨大的设计空间阻碍了快速发现。机器学习引导的分子设计是加速聚合物发现的一种很有前景的方法。然而,标记聚合物数据的稀缺以及合成聚合物复杂的层次结构使得生成式设计极具挑战性。我们改进了当前的先进方法,不仅生成重复单元,还生成包括化学计量和链结构的单体组合。我们基于最近的一种聚合物表示方法,该方法包括单体组合的化学计量和链结构,并开发了一种新颖的变分自编码器(VAE)架构,对图进行编码并对字符串进行解码。使用半监督设置,我们能够处理部分标记的数据集,这对于标记数据语料库较小的领域可能是有益的。我们的模型学习一个连续、组织良好的潜在空间(LS),能够生成包括不同单体化学计量和链结构的共聚物结构。在一个逆向设计案例研究中,我们通过在潜在空间中优化聚合物的电子亲和力和电离势,展示了我们用于发现新型共轭共聚物光催化剂用于制氢的模型。