Wu Jiajing, Wei Zhiqiang, Jia Dongning, Dou Xin, Tang Huo, Li Nannan
School of Information Science and Engineering, Ocean University of China, Qingdao, China.
China Research Institute of Radiowave Propagation, Qingdao, China.
PeerJ Comput Sci. 2022 Sep 5;8:e1083. doi: 10.7717/peerj-cs.1083. eCollection 2022.
Creating and maintaining a domain-specific database of research institutions, academic experts and scholarly literature is essential to expanding national marine science and technology. Knowledge graphs (KGs) have now been widely used in both industry and academia to address real-world problems. Despite the abundance of generic KGs, there is a vital need to build domain-specific knowledge graphs in the marine sciences domain. In addition, there is still not an effective method for named entity recognition when constructing a knowledge graph, especially when including data from both scientific and social media sources. This article presents a novel marine science domain-based knowledge graph framework. This framework involves capturing marine domain data into KG representations. The proposed approach utilizes various entity information based on marine domain experts to enrich the semantic content of the knowledge graph. To enhance named entity recognition accuracy, we propose a novel TrellisNet-CRF model. Our experiment results demonstrate that the TrellisNet-CRF model reached a 96.99% accuracy rate for marine domain named entity recognition, which outperforms the current state-of-the-art baseline. The effectiveness of the TrellisNet-CRF module was then further demonstrated and confirmed on entity recognition and visualization tasks.
创建并维护一个特定领域的研究机构、学术专家和学术文献数据库对于扩大国家海洋科学技术至关重要。知识图谱(KGs)目前已在工业界和学术界广泛用于解决现实世界的问题。尽管存在大量通用知识图谱,但在海洋科学领域构建特定领域知识图谱的需求仍然迫切。此外,在构建知识图谱时,尤其是在包含来自科学和社交媒体来源的数据时,仍然没有一种有效的命名实体识别方法。本文提出了一种新颖的基于海洋科学领域的知识图谱框架。该框架涉及将海洋领域数据捕获到知识图谱表示中。所提出的方法利用基于海洋领域专家的各种实体信息来丰富知识图谱的语义内容。为了提高命名实体识别的准确性,我们提出了一种新颖的TrellisNet-CRF模型。我们的实验结果表明,TrellisNet-CRF模型在海洋领域命名实体识别中达到了96.99%的准确率,优于当前最先进的基线。然后,TrellisNet-CRF模块在实体识别和可视化任务上的有效性得到了进一步证明和确认。