Chen Yanrui, Chen Guangwu, Li Peng
School of Automation and Electrical Engineering, Lanzhou Jiaotong University, Lanzhou 730070, China.
Key Laboratory of Plateau Traffic Information Engineering and Control of Gansu Province, Lanzhou Jiaotong University, Lanzhou 730070, China.
Sensors (Basel). 2024 Nov 6;24(22):7128. doi: 10.3390/s24227128.
To address the issue of efficiently reusing the massive amount of unstructured knowledge generated during the handling of track circuit equipment faults and to automate the construction of knowledge graphs in the railway maintenance domain, it is crucial to leverage knowledge extraction techniques to efficiently extract relational triplets from fault maintenance text data. Given the current lag in joint extraction technology within the railway domain and the inefficiency in resource utilization, this paper proposes a joint extraction model for track circuit entities and relations, integrating Global Pointer and tensor learning. Taking into account the associative characteristics of semantic relations, the nesting of domain-specific terms in the railway sector, and semantic diversity, this research views the relation extraction task as a tensor learning process and the entity recognition task as a span-based Global Pointer search process. First, a multi-layer dilate gated convolutional neural network with residual connections is used to extract key features and fuse the weighted information from the 12 different semantic layers of the RoBERTa-wwm-ext model, fully exploiting the performance of each encoding layer. Next, the Tucker decomposition method is utilized to capture the semantic correlations between relations, and an Efficient Global Pointer is employed to globally predict the start and end positions of subject and object entities, incorporating relative position information through rotary position embedding (RoPE). Finally, comparative experiments with existing mainstream joint extraction models were conducted, and the proposed model's excellent performance was validated on the English public datasets NYT and WebNLG, the Chinese public dataset DuIE, and a private track circuit dataset. The 1 scores on the NYT, WebNLG, and DuIE public datasets reached 92.1%, 92.7%, and 78.2%, respectively.
为了解决在处理轨道电路设备故障过程中高效复用大量非结构化知识以及实现铁路维护领域知识图谱自动化构建的问题,利用知识提取技术从故障维护文本数据中高效提取关系三元组至关重要。鉴于目前铁路领域联合提取技术的滞后以及资源利用效率低下的情况,本文提出了一种轨道电路实体与关系的联合提取模型,融合了Global Pointer和张量学习。考虑到语义关系的关联特性、铁路领域特定术语的嵌套以及语义多样性,本研究将关系提取任务视为张量学习过程,将实体识别任务视为基于跨度的Global Pointer搜索过程。首先,使用带有残差连接的多层扩张门控卷积神经网络来提取关键特征,并融合来自RoBERTa-wwm-ext模型12个不同语义层的加权信息,充分发挥每个编码层的性能。其次,利用Tucker分解方法捕捉关系之间的语义相关性,并采用Efficient Global Pointer全局预测主语和宾语实体的起始和结束位置,通过旋转位置嵌入(RoPE)纳入相对位置信息。最后,与现有的主流联合提取模型进行了对比实验,所提模型在英文公共数据集NYT和WebNLG、中文公共数据集DuIE以及一个私有轨道电路数据集上的优异性能得到了验证。在NYT、WebNLG和DuIE公共数据集上的F1分数分别达到了92.1%、92.7%和78.2%。