Suppr超能文献

机器学习预测超导材料化学成分的临界温度。

Machine-Learning Predictions of Critical Temperatures from Chemical Compositions of Superconductors.

机构信息

Cavendish Laboratory, Department of Physics, University of Cambridge, J. J. Thomson Avenue, Cambridge CB3 0HE, U.K.

ISIS Neutron and Muon Source, STFC Rutherford Appleton Laboratory, Harwell Science and Innovation Campus, Didcot, Oxfordshire OX11 0QX, U.K.

出版信息

J Chem Inf Model. 2024 Oct 14;64(19):7349-7375. doi: 10.1021/acs.jcim.4c01137. Epub 2024 Sep 17.

Abstract

In the quest for advanced superconducting materials, the accurate prediction of critical temperatures () poses a formidable challenge, largely due to the complex interdependencies between superconducting properties and the chemical and structural characteristics of a given material. To address this challenges, we have developed a machine-learning framework that aims to elucidate these complicated and hitherto poorly understood structure-property and property-property relationships. This study introduces a novel machine-learning-based workflow, termed the Gradient Boosted Feature Selection (GBFS), which has been tailored to predict for superconductors by employing a distributed gradient-boosting framework. This approach integrates exploratory data analyses, statistical evaluations, and multicollinearity reduction techniques to select highly relevant features from a high-dimensional feature space, derived solely from the chemical composition of materials. Our methodology was rigorously tested on a data set comprising approximately 16,400 chemical compounds with around 12,000 unique chemical compositions. The GBFS workflow enabled the development of a classification model that distinguishes compositions likely to exhibit values greater than 10 K. This model achieved a weighted average F1-score of 0.912, an AUC-ROC of 0.986, and an average precision score of 0.919. Additionally, the GBFS workflow underpinned a regression model that predicted values with an of 0.945, an MAE of 3.54 K, and an RMSE of 6.57 K on a test set obtained via random splitting. Further exploration was conducted through out-of-sample predictions, particularly those exceeding the liquid nitrogen temperature, and out-of-distribution predictions for (CaLa)FeAs based on varying lanthanum content. The outcome of our study underscores the significance of systematic feature analysis and selection in enhancing predictive model performance, offering various advantages over models that rely primarily on algorithmic complexity. This research not only advances the field of superconductivity but also sets a precedent for the application of machine learning in materials science.

摘要

在寻找先进超导材料的过程中,准确预测超导转变温度(Tc)是一项艰巨的挑战,这主要是因为超导性能与给定材料的化学和结构特性之间存在复杂的相互依存关系。为了应对这一挑战,我们开发了一种机器学习框架,旨在阐明这些复杂且迄今为止理解甚少的结构-性质和性质-性质关系。本研究提出了一种新的基于机器学习的工作流程,称为梯度提升特征选择(GBFS),它通过采用分布式梯度提升框架,旨在通过利用材料的化学组成来预测超导材料的 Tc。该方法集成了探索性数据分析、统计评估和多重共线性减少技术,从仅由材料化学组成得出的高维特征空间中选择高度相关的特征。我们的方法在一个包含大约 16400 种化合物和大约 12000 种独特化学成分的数据集上进行了严格测试。GBFS 工作流程使开发能够区分可能表现出 Tc 值大于 10K 的成分的分类模型成为可能。该模型实现了加权平均 F1 得分为 0.912、AUC-ROC 为 0.986 和平均精度得分为 0.919。此外,GBFS 工作流程支持回归模型,该模型在通过随机分割获得的测试集上预测 Tc 值,具有 0.945 的 R 平方、3.54K 的 MAE 和 6.57K 的 RMSE。进一步通过外推预测,特别是那些超过液氮温度的预测,以及基于变化的镧含量的 (CaLa)FeAs 的外推预测,进行了探索。我们的研究结果强调了系统特征分析和选择在提高预测模型性能方面的重要性,与主要依赖算法复杂度的模型相比具有多种优势。这项研究不仅推进了超导领域的发展,也为机器学习在材料科学中的应用树立了典范。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/97a9/11481088/d746c46386b7/ci4c01137_0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验