Cruse Kevin, Baibakova Viktoriia, Abdelsamie Maged, Hong Kootak, Bartel Christopher J, Trewartha Amalie, Jain Anubhav, Sutter-Fella Carolin M, Ceder Gerbrand
Department of Materials Science & Engineering, University of California, Berkeley, California 94720, United States.
Materials Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, United States.
Chem Mater. 2023 Dec 29;36(2):772-785. doi: 10.1021/acs.chemmater.3c02203. eCollection 2024 Jan 23.
We used data-driven methods to understand the formation of impurity phases in BiFeO thin-film synthesis through the sol-gel technique. Using a high-quality dataset of 331 synthesis procedures and outcomes extracted manually from 177 scientific articles, we trained decision tree models that reinforce important experimental heuristics for the avoidance of phase impurities but ultimately show limited predictive capability. We find that several important synthesis features, identified by our model, are often not reported in the literature. To test our ability to correctly impute missing synthesis parameters, we attempted to reproduce nine syntheses from the literature with varying degrees of "missingness". We demonstrate how a text-mined dataset can be made useful by informing new controlled experiments and forming a better understanding for impurity phase formation in this complex oxide system.
我们使用数据驱动的方法,通过溶胶-凝胶技术来理解BiFeO薄膜合成中杂质相的形成。我们从177篇科学文章中手动提取了331个合成程序及结果的高质量数据集,训练了决策树模型,这些模型强化了避免相杂质的重要实验启发式方法,但最终显示出有限的预测能力。我们发现,我们的模型识别出的几个重要合成特征在文献中往往未被报道。为了测试我们正确估算缺失合成参数的能力,我们尝试重现文献中九种具有不同程度“缺失性”的合成。我们展示了如何通过为新的对照实验提供信息,并更好地理解这种复杂氧化物系统中杂质相的形成,使文本挖掘的数据集变得有用。