Kim Hyunseung, Na Jonggeol, Lee Won Bo
School of Chemical and Biological Engineering, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul 08826, Republic of Korea.
Department of Chemical Engineering and Materials Science, Graduate Program in System Health Science and Engineering, Ewha Womans University, Seoul 03760, Republic of Korea.
J Chem Inf Model. 2021 Dec 27;61(12):5804-5814. doi: 10.1021/acs.jcim.1c01289. Epub 2021 Dec 2.
Discovering new materials better suited to specific purposes is an important issue in improving the quality of human life. Here, a neural network that creates molecules that meet some desired multiple target conditions based on a deep understanding of chemical language is proposed (generative chemical Transformer, GCT). The attention mechanism in GCT allows a deeper understanding of molecular structures beyond the limitations of chemical language itself which cause semantic discontinuity by paying attention to characters sparsely. The significance of language models for inverse molecular design problems is investigated by quantitatively evaluating the quality of the generated molecules. GCT generates highly realistic chemical strings that satisfy both chemical and linguistic grammar rules. Molecules parsed from the generated strings simultaneously satisfy the multiple target properties and vary for a single condition set. These advances will contribute to improving the quality of human life by accelerating the process of desired material discovery.
发现更适合特定用途的新材料是提高人类生活质量的一个重要问题。在此,我们提出了一种神经网络(生成式化学变换器,GCT),它基于对化学语言的深入理解来创建满足某些期望的多个目标条件的分子。GCT中的注意力机制通过稀疏关注字符,突破了化学语言本身导致语义不连续的局限性,从而更深入地理解分子结构。通过定量评估生成分子的质量,研究了语言模型对于逆分子设计问题的重要性。GCT生成高度逼真的化学字符串,这些字符串同时满足化学和语言语法规则。从生成的字符串解析出的分子同时满足多个目标属性,并且对于单个条件集有所变化。这些进展将通过加速所需材料的发现过程,为提高人类生活质量做出贡献。