Suppr超能文献

通过文献数据挖掘自动构建的介电陶瓷数据库

Dielectric Ceramics Database Automatically Constructed by Data Mining in the Literature.

作者信息

Wang Xiaochao, Zhang Wanli, Zhang Wenxu

机构信息

School of Integrated Circuits Science and Engineering, University of Electronic Science and Technology of China, Chengdu 610054, P.R. China.

出版信息

J Chem Inf Model. 2024 Aug 12;64(15):5931-5943. doi: 10.1021/acs.jcim.4c00282. Epub 2024 Jul 23.

Abstract

Vast published dielectric ceramics literature is a natural database for big-data analysis, discovering structure-property relationships, and property prediction. We constructed a data-mining pipeline based on natural language processing (NLP) to extract property information from about 12,900 published dielectric ceramics articles and normalized more than 20 properties. The micro-F1 scores for sentence classification, named entities recognition, relation extraction (related), and relation extraction (same), are 91.6, 82.4, 91.4, and 88.3%, respectively. We demonstrated the distribution of some essential properties according to the publication years to reveal the tendency. In order to test the reliability of the data extraction, we trained an XGBoost model to predict the dielectric constant and used the SHAP module to interpret the contribution of each feature in order to identify some of the factors that determine the dielectric properties. The result shows that including × in the model can increase the dielectric constant prediction accuracy. Our work can give some hints to experimentalists on their way to improve the performances of cutting-edge materials.

摘要

大量已发表的介电陶瓷文献是用于大数据分析、发现结构-性能关系和性能预测的天然数据库。我们构建了一个基于自然语言处理(NLP)的数据挖掘管道,从约12900篇已发表的介电陶瓷文章中提取性能信息,并对20多种性能进行了归一化处理。句子分类、命名实体识别、关系提取(相关)和关系提取(相同)的微观F1分数分别为91.6%、82.4%、91.4%和88.3%。我们根据出版年份展示了一些基本性能的分布情况,以揭示其趋势。为了测试数据提取的可靠性,我们训练了一个XGBoost模型来预测介电常数,并使用SHAP模块解释每个特征的贡献,以便识别一些决定介电性能的因素。结果表明,在模型中纳入×可以提高介电常数的预测精度。我们的工作可以为实验人员在提高前沿材料性能的道路上提供一些启示。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验