Suppr超能文献

一种使用自然语言处理从大型聚合物语料库中提取通用材料属性数据的管道。

A general-purpose material property data extraction pipeline from large polymer corpora using natural language processing.

作者信息

Shetty Pranav, Rajan Arunkumar Chitteth, Kuenneth Chris, Gupta Sonakshi, Panchumarti Lakshmi Prerana, Holm Lauren, Zhang Chao, Ramprasad Rampi

机构信息

School of Computational Science & Engineering, Atlanta, GA USA.

School of Materials Science and Engineering, Georgia Institute of Technology, 771 Ferst Drive NW, Atlanta, 30332 GA USA.

出版信息

NPJ Comput Mater. 2023;9(1):52. doi: 10.1038/s41524-023-01003-w. Epub 2023 Apr 5.

Abstract

The ever-increasing number of materials science articles makes it hard to infer chemistry-structure-property relations from literature. We used natural language processing methods to automatically extract material property data from the abstracts of polymer literature. As a component of our pipeline, we trained MaterialsBERT, a language model, using 2.4 million materials science abstracts, which outperforms other baseline models in three out of five named entity recognition datasets. Using this pipeline, we obtained ~300,000 material property records from ~130,000 abstracts in 60 hours. The extracted data was analyzed for a diverse range of applications such as fuel cells, supercapacitors, and polymer solar cells to recover non-trivial insights. The data extracted through our pipeline is made available at polymerscholar.org which can be used to locate material property data recorded in abstracts. This work demonstrates the feasibility of an automatic pipeline that starts from published literature and ends with extracted material property information.

摘要

材料科学文章数量的不断增加使得从文献中推断化学-结构-性能关系变得困难。我们使用自然语言处理方法从聚合物文献的摘要中自动提取材料性能数据。作为我们流程的一个组成部分,我们使用240万篇材料科学摘要训练了语言模型MaterialsBERT,在五个命名实体识别数据集中的三个中,它的表现优于其他基线模型。使用这个流程,我们在60小时内从约130,000篇摘要中获得了约300,000条材料性能记录。对提取的数据进行了分析,用于燃料电池、超级电容器和聚合物太阳能电池等各种应用,以获得重要的见解。通过我们的流程提取的数据可在polymerscholar.org上获取,可用于查找摘要中记录的材料性能数据。这项工作证明了一个从已发表文献开始并以提取的材料性能信息结束的自动流程的可行性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/649b/10073792/5e0c94a34511/41524_2023_1003_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验