Suppr超能文献

通过半监督关系抽取技术生成居里温度和奈尔温度的自动材料数据库。

Auto-generated materials database of Curie and Néel temperatures via semi-supervised relationship extraction.

机构信息

Cavendish Laboratory, Department of Physics, University of Cambridge, J.J. Thomson Avenue, Cambridge CB3 0HE, UK.

ISIS Neutron and Muon Source, STFC Rutherford Appleton Laboratory, Harwell Science and Innovation Campus, Didcot, Oxfordshire OX11 0QX, UK.

出版信息

Sci Data. 2018 Jun 19;5:180111. doi: 10.1038/sdata.2018.111.

Abstract

Large auto-generated databases of magnetic materials properties have the potential for great utility in materials science research. This article presents an auto-generated database of 39,822 records containing chemical compounds and their associated Curie and Néel magnetic phase transition temperatures. The database was produced using natural language processing and semi-supervised quaternary relationship extraction, applied to a corpus of 68,078 chemistry and physics articles. Evaluation of the database shows an estimated overall precision of 73%. Therein, records processed with the text-mining toolkit, ChemDataExtractor, were assisted by a modified Snowball algorithm, whose original binary relationship extraction capabilities were extended to quaternary relationship extraction. Consequently, its machine learning component can now train with ≤ 500 seeds, rather than the 4,000 originally used. Data processed with the modified Snowball algorithm affords 82% precision. Database records are available in MongoDB, CSV and JSON formats which can easily be read using Python, R, Java and MatLab. This makes the database easy to query for tackling big-data materials science initiatives and provides a basis for magnetic materials discovery.

摘要

大型自动生成的磁性材料属性数据库在材料科学研究中有很大的实用价值。本文介绍了一个自动生成的包含 39822 条记录的数据库,其中包含化合物及其相关居里和尼尔磁相变温度。该数据库是使用自然语言处理和半监督四元关系提取生成的,应用于 68078 篇化学和物理文章的语料库。对该数据库的评估表明,总体精度估计为 73%。其中,使用文本挖掘工具包 ChemDataExtractor 处理的记录由经过修改的 Snowball 算法辅助,该算法的原始二进制关系提取功能扩展到了四元关系提取。因此,其机器学习组件现在可以使用≤500 个种子进行训练,而不是最初使用的 4000 个。经过修改的 Snowball 算法处理的数据具有 82%的精度。数据库记录以 MongoDB、CSV 和 JSON 格式提供,可使用 Python、R、Java 和 MatLab 轻松读取。这使得数据库易于查询,可用于解决大数据材料科学计划,并为磁性材料发现提供了基础。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cd06/6007086/1655ea78517c/sdata2018111-f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验