Suppr超能文献

通过半监督关系抽取技术生成居里温度和奈尔温度的自动材料数据库。

Auto-generated materials database of Curie and Néel temperatures via semi-supervised relationship extraction.

机构信息

Cavendish Laboratory, Department of Physics, University of Cambridge, J.J. Thomson Avenue, Cambridge CB3 0HE, UK.

ISIS Neutron and Muon Source, STFC Rutherford Appleton Laboratory, Harwell Science and Innovation Campus, Didcot, Oxfordshire OX11 0QX, UK.

出版信息

Sci Data. 2018 Jun 19;5:180111. doi: 10.1038/sdata.2018.111.

Abstract

Large auto-generated databases of magnetic materials properties have the potential for great utility in materials science research. This article presents an auto-generated database of 39,822 records containing chemical compounds and their associated Curie and Néel magnetic phase transition temperatures. The database was produced using natural language processing and semi-supervised quaternary relationship extraction, applied to a corpus of 68,078 chemistry and physics articles. Evaluation of the database shows an estimated overall precision of 73%. Therein, records processed with the text-mining toolkit, ChemDataExtractor, were assisted by a modified Snowball algorithm, whose original binary relationship extraction capabilities were extended to quaternary relationship extraction. Consequently, its machine learning component can now train with ≤ 500 seeds, rather than the 4,000 originally used. Data processed with the modified Snowball algorithm affords 82% precision. Database records are available in MongoDB, CSV and JSON formats which can easily be read using Python, R, Java and MatLab. This makes the database easy to query for tackling big-data materials science initiatives and provides a basis for magnetic materials discovery.

摘要

大型自动生成的磁性材料属性数据库在材料科学研究中有很大的实用价值。本文介绍了一个自动生成的包含 39822 条记录的数据库,其中包含化合物及其相关居里和尼尔磁相变温度。该数据库是使用自然语言处理和半监督四元关系提取生成的,应用于 68078 篇化学和物理文章的语料库。对该数据库的评估表明,总体精度估计为 73%。其中,使用文本挖掘工具包 ChemDataExtractor 处理的记录由经过修改的 Snowball 算法辅助,该算法的原始二进制关系提取功能扩展到了四元关系提取。因此,其机器学习组件现在可以使用≤500 个种子进行训练,而不是最初使用的 4000 个。经过修改的 Snowball 算法处理的数据具有 82%的精度。数据库记录以 MongoDB、CSV 和 JSON 格式提供,可使用 Python、R、Java 和 MatLab 轻松读取。这使得数据库易于查询,可用于解决大数据材料科学计划,并为磁性材料发现提供了基础。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cd06/6007086/1655ea78517c/sdata2018111-f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验