Gómez-Peralta Juan Iván, Bokhimi Xim, Quintana Patricia
Laboratorio Nacional de Nano y Biomateriales, CINVESTAV-IPN, Antigua Carretera a Progreso km 6, A. P. 37, 97310 Mérida, Yucatán, Mexico.
Instituto de Física, Universidad Nacional Autónoma de México, A. P. 20-364, 01000 Ciudad de México, DF, Mexico.
J Phys Chem A. 2023 Sep 14;127(36):7655-7664. doi: 10.1021/acs.jpca.3c03860. Epub 2023 Aug 30.
This article presents the development of convolutional neural networks (CNNs) for the estimation of lattice parameters in organic compounds across various crystal systems. A comprehensive collection of 92,085 organic compounds was utilized to train the CNNs, encompassing crystals with unit cells containing up to 512 atoms and a maximum unit cell volume of 8000 Å. Simulated diffraction patterns were generated for each compound, comprising four diffraction patterns with different crystal sizes. These diffraction patterns were generated within a 2θ window of 3-60°, employing a step size of 0.02051°. Two distinct CNN architectures were developed with differing input data. The first CNN, referred to as XRD-CNN, was trained solely on diffraction patterns. In the test set, XRD-CNN demonstrated a mean absolute percentage error (MAPE) of 11.04% for unit cell vectors, 7.40% for angles, and 26.83% for unit cell volume. The second CNN, XRDElem-CNN, incorporated a binary representation of atoms within the unit cell as an additional input. XRDElem-CNN achieved improved performance, yielding MAPE values of 4.73% for unit vectors, 6.49% for angles, and 6.05% for the unit cell volume. To validate the performance of XRDElem-CNN, real diffraction patterns obtained from a conventional laboratory diffractometer (Cu Kα wavelength) were employed. Various representations of atoms within the unit cell were proposed, which were computationally efficient for evaluation with the CNNs. The assessed lattice parameters by XRDElem-CNN were validated using the Lp-search method, showing agreement with the reported values.
本文介绍了用于估计各种晶体系统中有机化合物晶格参数的卷积神经网络(CNN)的发展。利用92085种有机化合物的综合数据集来训练CNN,这些化合物的晶体晶胞包含多达512个原子,最大晶胞体积为8000 Å。为每种化合物生成模拟衍射图案,包括四种不同晶体尺寸的衍射图案。这些衍射图案在2θ范围为3 - 60°内生成,步长为0.02051°。开发了两种具有不同输入数据的不同CNN架构。第一个CNN称为XRD - CNN,仅在衍射图案上进行训练。在测试集中,XRD - CNN对于晶胞向量的平均绝对百分比误差(MAPE)为11.04%,对于角度为7.40%,对于晶胞体积为26.83%。第二个CNN,XRDElem - CNN,将晶胞内原子的二进制表示作为额外输入。XRDElem - CNN性能有所提升,对于单位向量的MAPE值为4.73%,对于角度为6.49%,对于晶胞体积为6.05%。为了验证XRDElem - CNN的性能,采用了从传统实验室衍射仪(Cu Kα波长)获得的真实衍射图案。提出了晶胞内原子的各种表示方法,这些方法在与CNN进行评估时计算效率高。通过XRDElem - CNN评估的晶格参数使用Lp搜索方法进行了验证,结果与报道值一致。