Rincón Sergio, González Gabriel, Macías Mario A, Arbeláez Pablo
Crystallography and Chemistry of Materials, CrisQuimMat, Department of Chemistry, Universidad de los Andes, Bogotá, 111711, Colombia.
Center for Research and Formation in Artificial Intelligence, Universidad de los Andes, Bogotá, 111711, Colombia.
Sci Data. 2025 Jul 10;12(1):1186. doi: 10.1038/s41597-025-05534-3.
Although crystal parameter prediction from powder X-ray diffraction has recently attracted the interest of the machine learning community, most existing datasets for this task are private and lack structural diversity. Here, we introduce the Simulated Powder X-ray Diffraction Open Database (SIMPOD), a new dataset that is public and structurally varied. This new benchmark includes 467,861 crystal structures from the Crystallography Open Database (COD) and their powder X-ray diffraction patterns. SIMPOD presents simulated one-dimensional powder X-ray diffractograms and derived two-dimensional radial images to facilitate the adoption of computer vision models for this task. We hope SIMPOD contributes to developing models that improve materials analysis from powder X-ray diffraction.
尽管从粉末X射线衍射预测晶体参数最近引起了机器学习社区的关注,但用于此任务的大多数现有数据集都是私有的,并且缺乏结构多样性。在这里,我们引入了模拟粉末X射线衍射开放数据库(SIMPOD),这是一个公开且结构多样的新数据集。这个新的基准数据集包括来自晶体学开放数据库(COD)的467,861个晶体结构及其粉末X射线衍射图谱。SIMPOD提供了模拟的一维粉末X射线衍射图和派生的二维径向图像,以促进计算机视觉模型在此任务中的应用。我们希望SIMPOD有助于开发能够改进粉末X射线衍射材料分析的模型。