Cavendish Laboratory, Department of Physics, University of Cambridge, J.J. Thomson Avenue, Cambridge CB3 0HE, U.K.
ISIS Neutron and Muon Source, Rutherford Appleton Laboratory, Harwell Science and Innovation Campus, Didcot, Oxfordshire OX11 0QX, U.K.
J Chem Inf Model. 2022 Dec 26;62(24):6365-6377. doi: 10.1021/acs.jcim.2c00035. Epub 2022 May 9.
A great number of scientific papers are published every year in the field of battery research, which forms a huge textual data source. However, it is difficult to explore and retrieve useful information efficiently from these large unstructured sets of text. The Bidirectional Encoder Representations from Transformers (BERT) model, trained on a large data set in an unsupervised way, provides a route to process the scientific text automatically with minimal human effort. To this end, we realized six battery-related BERT models, namely, BatteryBERT, BatteryOnlyBERT, and BatterySciBERT, each of which consists of both cased and uncased models. They have been trained specifically on a corpus of battery research papers. The pretrained BatteryBERT models were then fine-tuned on downstream tasks, including battery paper classification and extractive question-answering for battery device component classification that distinguishes anode, cathode, and electrolyte materials. Our BatteryBERT models were found to outperform the original BERT models on the specific battery tasks. The fine-tuned BatteryBERT was then used to perform battery database enhancement. We also provide a website application for its interactive use and visualization.
每年在电池研究领域都会发表大量的科学论文,形成了庞大的文本数据源。然而,从这些大型非结构化的文本集中高效地探索和检索有用信息是很困难的。基于转换器的双向编码器表示(Bidirectional Encoder Representations from Transformers,BERT)模型,通过无监督的方式在大型数据集上进行训练,为自动处理科学文本提供了一种途径,只需很少的人工干预。为此,我们实现了六个与电池相关的 BERT 模型,分别是 BatteryBERT、BatteryOnlyBERT 和 BatterySciBERT,每个模型都包含大小写模型。这些模型都是专门针对电池研究论文语料库进行训练的。然后,我们在下游任务(包括电池论文分类和用于电池器件成分分类的抽取式问答,可区分阳极、阴极和电解质材料)上对预训练的 BatteryBERT 模型进行微调。我们发现,BatteryBERT 模型在特定的电池任务上优于原始的 BERT 模型。然后,我们使用微调后的 BatteryBERT 来增强电池数据库。我们还提供了一个网站应用程序,供其交互使用和可视化。