Chitturi Sathya R, Ratner Daniel, Walroth Richard C, Thampy Vivek, Reed Evan J, Dunne Mike, Tassone Christopher J, Stone Kevin H
Materials Science and Engineering, Stanford University, Stanford, CA94305, USA.
SLAC National Accelerator Laboratory, Menlo Park, CA 94025, USA.
J Appl Crystallogr. 2021 Nov 30;54(Pt 6):1799-1810. doi: 10.1107/S1600576721010840. eCollection 2021 Dec 1.
A key step in the analysis of powder X-ray diffraction (PXRD) data is the accurate determination of unit-cell lattice parameters. This step often requires significant human intervention and is a bottleneck that hinders efforts towards automated analysis. This work develops a series of one-dimensional convolutional neural networks (1D-CNNs) trained to provide lattice parameter estimates for each crystal system. A mean absolute percentage error of approximately 10% is achieved for each crystal system, which corresponds to a 100- to 1000-fold reduction in lattice parameter search space volume. The models learn from nearly one million crystal structures contained within the Inorganic Crystal Structure Database and the Cambridge Structural Database and, due to the nature of these two complimentary databases, the models generalize well across chemistries. A key component of this work is a systematic analysis of the effect of different realistic experimental non-idealities on model performance. It is found that the addition of impurity phases, baseline noise and peak broadening present the greatest challenges to learning, while zero-offset error and random intensity modulations have little effect. However, appropriate data modification schemes can be used to bolster model performance and yield reasonable predictions, even for data which simulate realistic experimental non-idealities. In order to obtain accurate results, a new approach is introduced which uses the initial machine learning estimates with existing iterative whole-pattern refinement schemes to tackle automated unit-cell solution.
粉末X射线衍射(PXRD)数据分析中的关键步骤是精确确定晶胞晶格参数。这一步骤通常需要大量人工干预,是阻碍自动化分析工作的瓶颈。这项工作开发了一系列一维卷积神经网络(1D-CNN),经过训练可为每个晶体系统提供晶格参数估计值。每个晶体系统的平均绝对百分比误差约为10%,这相当于晶格参数搜索空间体积减少了100到1000倍。这些模型从无机晶体结构数据库和剑桥结构数据库中包含的近100万个晶体结构中学习,并且由于这两个互补数据库的性质,这些模型在不同化学物质中具有良好的通用性。这项工作的一个关键组成部分是系统分析不同现实实验非理想情况对模型性能的影响。研究发现,杂质相、基线噪声和峰展宽对学习提出了最大挑战,而零偏移误差和随机强度调制影响较小。然而,即使对于模拟现实实验非理想情况的数据,也可以使用适当的数据修改方案来提高模型性能并得出合理的预测。为了获得准确结果,引入了一种新方法,该方法将机器学习初始估计值与现有的迭代全模式精修方案相结合,以解决晶胞自动求解问题。