Suppr超能文献

用于机器学习应用的纳米催化剂的X射线吸收光谱数据预处理

XAS Data Preprocessing of Nanocatalysts for Machine Learning Applications.

作者信息

Kartashov Oleg O, Chernov Andrey V, Polyanichenko Dmitry S, Butakova Maria A

机构信息

The Smart Materials Research Institute, Southern Federal University, 178/24 Sladkova, 344090 Rostov-on-Don, Russia.

出版信息

Materials (Basel). 2021 Dec 20;14(24):7884. doi: 10.3390/ma14247884.

Abstract

Innovative development in the energy and chemical industries is mainly dependent on advances in the accelerated design and development of new functional materials. The success of research in new nanocatalysts mainly relies on modern techniques and approaches for their precise characterization. The existing methods of experimental characterization of nanocatalysts, which make it possible to assess the possibility of using these materials in specific chemical reactions or applications, generate significant amounts of heterogeneous data. The acceleration of new functional materials, including nanocatalysts, directly depends on the speed and quality of extracting hidden dependencies and knowledge from the obtained experimental data. Usually, such experiments involve different characterization techniques and different types of X-ray absorption spectroscopy (XAS) too. Using the machine learning (ML) methods based on XAS data, we can study and predict the atomic-scale structure and another bunch of parameters for the nanocatalyst efficiently. However, before using any ML model, it is necessary to make sure that the XAS raw experimental data is properly pre-processed, cleared, and prepared for ML application. Usually, the XAS preprocessing stage is vaguely presented in scientific studies, and the main efforts of researchers are devoted to the ML description and implementation stage. However, the quality of the input data influences the quality of ML analysis and the prediction results used in the future. This paper fills the gap between the stage of obtaining XAS data from synchrotron facilities and the stage of using and customizing various ML analysis and prediction models. We aimed this study to develop automated tools for the preprocessing and presentation of data from physical experiments and the creation of deposited datasets on the basis of the example of studying palladium-based nanocatalysts using synchrotron radiation facilities. During the study, methods of preliminary processing of XAS data were considered, which can be conditionally divided into X-ray absorption near edge structure (XANES) and extended X-ray absorption fine structure (EXAFS). This paper proposes a software toolkit that implements data preprocessing scenarios in the form of a single pipeline. The main preprocessing methods used in this study proposed are principal component analysis (PCA); z-score normalization; the interquartile method for eliminating outliers in the data; as well as the k-means machine learning method, which makes it possible to clarify the phase of the studied material sample by clustering feature vectors of experiments. Among the results of this study, one should also highlight the obtained deposited datasets of physical experiments on palladium-based nanocatalysts using synchrotron radiation. This will allow for further high-quality data mining to extract new knowledge about materials using artificial intelligence methods and machine learning models, and will ensure the smooth dissemination of these datasets to researchers and their reuse.

摘要

能源和化工行业的创新发展主要依赖于新型功能材料加速设计与开发方面的进展。新型纳米催化剂的研究成功主要依赖于对其进行精确表征的现代技术和方法。现有的纳米催化剂实验表征方法能够评估这些材料在特定化学反应或应用中的使用可能性,但会产生大量异构数据。新型功能材料(包括纳米催化剂)的加速研发直接取决于从所获实验数据中提取隐藏的相关性和知识的速度与质量。通常,此类实验还涉及不同的表征技术以及不同类型的X射线吸收光谱(XAS)。利用基于XAS数据的机器学习(ML)方法,我们可以高效地研究和预测纳米催化剂的原子尺度结构及其他一系列参数。然而,在使用任何ML模型之前,有必要确保XAS原始实验数据经过适当的预处理、清理,并为ML应用做好准备。通常,XAS预处理阶段在科学研究中阐述得较为模糊,研究人员的主要精力都集中在ML描述和实施阶段。然而,输入数据的质量会影响ML分析的质量以及未来使用的预测结果。本文填补了从同步加速器设施获取XAS数据阶段与使用和定制各种ML分析及预测模型阶段之间的空白。我们开展这项研究的目的是,以利用同步辐射设施研究钯基纳米催化剂为例,开发用于物理实验数据预处理和呈现以及创建沉积数据集的自动化工具。在研究过程中,考虑了XAS数据的初步处理方法,这些方法可按条件分为X射线吸收近边结构(XANES)和扩展X射线吸收精细结构(EXAFS)。本文提出了一个软件工具包,该工具包以单一管道的形式实现数据预处理方案。本研究提出的主要预处理方法有主成分分析(PCA);z分数归一化;用于消除数据中异常值的四分位数法;以及k均值机器学习方法,该方法能够通过对实验特征向量进行聚类来明确所研究材料样品 的相。在本研究结果中,还应突出利用同步辐射获得的钯基纳米催化剂物理实验沉积数据集。这将有助于进一步进行高质量的数据挖掘,以利用人工智能方法和机器学习模型提取有关材料的新知识,并将确保这些数据集顺利传播给研究人员并供其重复使用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0e60/8709119/a22867e32b30/materials-14-07884-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验