Suppr超能文献

基于混合深度学习的高速公路绿色通道文本记录信息提取

Information extraction from green channel textual records on expressways using hybrid deep learning.

作者信息

Chen Jiaona, Zhang Jing, Tao Weijun, Jin Yinli, Fan Heng

机构信息

Xi'an Shiyou University School of Electronic Engineering, Xi'an, 710065, China.

Chang'an University School of Electronic and Control Engineering, Xi'an, 710065, China.

出版信息

Sci Rep. 2024 Dec 28;14(1):31269. doi: 10.1038/s41598-024-82681-4.

Abstract

The expressway green channel is an essential transportation policy for moving fresh agricultural products in China. In order to extract knowledge from various records, this study presents a cutting-edge approach to extract information from textual records of failure cases in the vertical field of expressway green channel. We proposed a hybrid approach based on BIO labeling, pre-trained model, deep learning and CRF to build a named entity recognition (NER) model with the optimal prediction performance. Eight entities are designed and proposed in the NER processing for the expressway green channel. three typical pre-trained natural language processing models are utilized and compared to recognize entities and obtain feature vectors, including bidirectional encoder representations from transformer (BERT), ALBERT, and RoBERTa. An ablation experiment is performed to analyze the influence of each factor on the proposed models. Used the survey data from the expressway green channel management system in Shaanxi Province of China, the experimental results show that the precision, recall, and F1-score of the RoBERTa-BiGRU-CRF model are 93.04%, 92.99%, and 92.99%, respectively. As the results, it is discovered that the text features extracted from pre-training substantially enhance the prediction accuracy of deep learning algorithms. Surprisingly, the RoBERTa model is highly effective in the task for the expressway green channel NER. This study provides a timely and necessary knowledge extraction on the Expressway Green Channel in terms of textual data, offering a systematical explanation of failure cases and valuable insights for future research.

摘要

高速公路绿色通道是中国鲜活农产品运输的一项重要交通政策。为了从各种记录中提取知识,本研究提出了一种前沿方法,用于从高速公路绿色通道垂直领域的故障案例文本记录中提取信息。我们提出了一种基于BIO标注、预训练模型、深度学习和条件随机场的混合方法,以构建具有最佳预测性能的命名实体识别(NER)模型。在高速公路绿色通道的NER处理中设计并提出了八个实体。利用并比较了三种典型的预训练自然语言处理模型来识别实体并获取特征向量,包括来自变换器的双向编码器表示(BERT)、ALBERT和RoBERTa。进行了消融实验以分析每个因素对所提出模型的影响。使用来自中国陕西省高速公路绿色通道管理系统的调查数据,实验结果表明,RoBERTa-BiGRU-CRF模型的精确率、召回率和F1分数分别为93.04%、92.99%和92.99%。结果发现,从预训练中提取的文本特征显著提高了深度学习算法的预测准确性。令人惊讶的是,RoBERTa模型在高速公路绿色通道NER任务中非常有效。本研究在文本数据方面为高速公路绿色通道提供了及时且必要的知识提取,对故障案例进行了系统解释,并为未来研究提供了有价值的见解。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验