College of Information and Computer, Taiyuan University of Technology, Taiyuan, Shanxi 030000, China.
Comput Intell Neurosci. 2022 May 14;2022:3139898. doi: 10.1155/2022/3139898. eCollection 2022.
The Internet is rich in information related to the financial field. The financial entity information text containing new internet vocabulary has a certain impact on the results of existing recognition algorithms. How to solve the problems of new vocabulary and polysemy is a problem to be solved in the current field. This paper proposes an ERNIE-Doc-BiLSTM-CRF named entity recognition model based on the pretrained language model. Compared with the traditional model, the ERNIE-Doc pretrained language model constructs a unique word vector from the word vector and combines the location coding, which solves polysemy problem well. The intensive skimming mechanism realizes the long text processing well and captures the context information effectively. The experimental results show that the accuracy of this model is 86.72%, the recall rate is 83.39%, and the F1 value is 85.02%, which is 13.36% higher than other models; the recall rate is increased by 13.05%, and the F1 value is increased by 13.21%.
互联网上充斥着大量与金融领域相关的信息。含有新网络词汇的金融实体信息文本对现有识别算法的结果有一定影响。如何解决新词和一词多义问题是当前领域需要解决的问题。本文提出了一种基于预训练语言模型的 ERNIE-Doc-BiLSTM-CRF 命名实体识别模型。与传统模型相比,ERNIE-Doc 预训练语言模型从词向量构建独特的词向量,并结合位置编码,很好地解决了一词多义问题。密集式略读机制很好地实现了长文本处理,并有效地捕获了上下文信息。实验结果表明,该模型的准确率为 86.72%,召回率为 83.39%,F1 值为 85.02%,比其他模型高出 13.36%;召回率提高了 13.05%,F1 值提高了 13.21%。