White Andrew J, Gibaldi Marco, Burner Jake, Mayo R Alex, Woo Tom K
Department of Chemistry and Biomolecular Sciences, University of Ottawa, Ottawa, Canada K1N 9A4.
J Am Chem Soc. 2025 May 28;147(21):17579-17583. doi: 10.1021/jacs.5c04914. Epub 2025 May 15.
"Computation-ready" metal-organic framework (MOF) databases provide essential raw data for high-throughput computational screening (HTS) and machine-learning approaches to materials discovery. However, the structural fidelity of these databases remains largely unquantified. We introduce MOSAEC, an algorithm that detects chemically invalid structures based on metal oxidation states. MOSAEC was manually validated against 14,796 MOF structures from the popular CoRE database and found to flag erroneous structures with 96% accuracy. Examination of 14 leading experimental and hypothetical MOF databases containing >1.9 million structures reveals structural error rates exceeding 40% in most cases. Analysis of 8 recent HTS studies which highlighted top-performing candidates shows that 52% of these structures were chemically invalid.
“计算就绪”的金属有机框架(MOF)数据库为高通量计算筛选(HTS)和材料发现的机器学习方法提供了重要的原始数据。然而,这些数据库的结构保真度在很大程度上仍未得到量化。我们引入了MOSAEC,一种基于金属氧化态检测化学无效结构的算法。MOSAEC针对来自流行的CoRE数据库的14796个MOF结构进行了人工验证,发现其标记错误结构的准确率为96%。对包含超过190万个结构的14个领先的实验性和假设性MOF数据库进行检查发现,在大多数情况下,结构错误率超过40%。对8项最近突出表现最佳候选物的HTS研究进行分析表明,这些结构中有52%在化学上是无效的。