Badrinarayanan Srivathsan, Magar Rishikesh, Antony Akshay, Meda Radheesh Sharma, Barati Farimani Amir
Department of Chemical Engineering, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States.
Department of Mechanical Engineering, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States.
J Chem Inf Model. 2025 Sep 8;65(17):9049-9060. doi: 10.1021/acs.jcim.5c01625. Epub 2025 Aug 28.
The discovery of Metal-Organic Frameworks (MOFs) with application-specific properties remains a central challenge in materials chemistry, owing to the immense size and complexity of their structural design space. Conventional computational screening techniques such as molecular simulations and density functional theory (DFT), while accurate, are computationally prohibitive at scale. Machine learning offers an exciting alternative by leveraging data-driven approaches to accelerate materials discovery. The complexity of MOFs, with their extended periodic structures and diverse topologies, creates both opportunities and challenges for generative modeling approaches. To address these challenges, we present a reinforcement learning-enhanced, transformer-based framework for the de novo design of MOFs. Central to our approach is MOFid, a chemically informed string representation encoding both connectivity and topology, enabling scalable generative modeling. Our pipeline comprises three components: (1) a generative GPT model trained on MOFid sequences, (2) MOFormer, a transformer-based property predictor, and (3) a reinforcement learning (RL) module that optimizes generated candidates via property-guided reward functions. By integrating property feedback into sequence generation, our method drives the model toward synthesizable, topologically valid MOFs with desired functional attributes. This work demonstrates the potential of large language models, when coupled with reinforcement learning, to accelerate inverse design in reticular chemistry and unlock new frontiers in computational MOF discovery.
由于金属有机框架(MOF)结构设计空间的巨大规模和复杂性,发现具有特定应用属性的MOF仍然是材料化学中的一个核心挑战。传统的计算筛选技术,如分子模拟和密度泛函理论(DFT),虽然准确,但在大规模计算时成本过高。机器学习通过利用数据驱动的方法来加速材料发现,提供了一种令人兴奋的替代方案。MOF具有扩展的周期性结构和多样的拓扑结构,其复杂性为生成建模方法带来了机遇和挑战。为了应对这些挑战,我们提出了一种基于强化学习增强的、基于Transformer的MOF从头设计框架。我们方法的核心是MOFid,一种化学信息字符串表示,它编码了连接性和拓扑结构,实现了可扩展的生成建模。我们的流程包括三个组件:(1)在MOFid序列上训练的生成式GPT模型,(2)MOFormer,一个基于Transformer的属性预测器,以及(3)一个强化学习(RL)模块,该模块通过属性引导的奖励函数优化生成的候选物。通过将属性反馈集成到序列生成中,我们的方法推动模型朝着具有所需功能属性的可合成、拓扑有效的MOF发展。这项工作展示了大语言模型与强化学习相结合时,在加速网状化学中的逆设计和开启计算MOF发现新前沿方面的潜力。