Corriero Nicola, Rizzi Rosanna, Settembre Gaetano, Del Buono Nicoletta, Diacono Domenico
Institute of Crystallography, CNR, Bari, Italy.
Department of Mathematics, University of Bari Aldo Moro, Bari, Italy.
J Appl Crystallogr. 2023 Feb 28;56(Pt 2):409-419. doi: 10.1107/S1600576723000596. eCollection 2023 Apr 1.
Determination of the crystal system and space group is the first step of crystal structure analysis. Often this turns out to be a bottleneck in the material characterization workflow for polycrystalline compounds, thus requiring manual interventions. This work proposes a new machine-learning (ML)-based web platform, (Crystallography MachinE LeArning), for crystal systems classification. Two different ML models, random forest and convolutional neural network, are available through the platform, as well as the extremely randomized trees algorithm, available from the literature. The ML models learned from simulated powder X-ray diffraction patterns of more than 280 000 published crystal structures from organic, inorganic and metal-organic compounds and minerals which were collected from the POW_COD database. A crystal system classification accuracy of 70%, which improved to more than 90% when considering the Top-2 classification accuracy, was obtained in tenfold cross-validation. The validity of the trained models has also been tested against independent experimental data of published compounds. The classification options in the platform are powerful, easy to use and supported by a user-friendly graphic interface. They can be extended over time with contributions from the community. The tool is freely available at https://www.ba.ic.cnr.it/softwareic/crystalmela/ following registration.
确定晶体系统和空间群是晶体结构分析的第一步。对于多晶化合物,这往往成为材料表征工作流程中的一个瓶颈,因此需要人工干预。这项工作提出了一个基于机器学习(ML)的新网络平台(晶体学机器学习平台,Crystallography MachinE LeArning),用于晶体系统分类。该平台提供了两种不同的机器学习模型,即随机森林和卷积神经网络,以及文献中可用的极端随机树算法。这些机器学习模型是从超过280000个已发表晶体结构的模拟粉末X射线衍射图谱中学习得到的,这些晶体结构来自有机、无机和金属有机化合物以及从POW_COD数据库收集的矿物。在十折交叉验证中,晶体系统分类准确率达到70%,若考虑前两名分类准确率,则提高到90%以上。训练模型的有效性也已针对已发表化合物的独立实验数据进行了测试。该平台的分类选项功能强大、易于使用,并由用户友好的图形界面提供支持。随着社区的贡献,它们可以随着时间的推移而扩展。注册后可在https://www.ba.ic.cnr.it/softwareic/crystalmela/免费获取该工具。