• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种融合多源异构数据以构建企业知识图谱的解决方案与实践。

A solution and practice for combining multi-source heterogeneous data to construct enterprise knowledge graph.

作者信息

Yan Chenwei, Fang Xinyue, Huang Xiaotong, Guo Chenyi, Wu Ji

机构信息

School of Computer Science (National Pilot Software Engineering School), Beijing University of Posts and Telecommunications, Beijing, China.

Key Laboratory of Trustworthy Distributed Computing and Service (BUPT), Ministry of Education, Beijing University of Posts and Telecommunications, Beijing, China.

出版信息

Front Big Data. 2023 Sep 28;6:1278153. doi: 10.3389/fdata.2023.1278153. eCollection 2023.

DOI:10.3389/fdata.2023.1278153
PMID:37841897
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10569599/
Abstract

The knowledge graph is one of the essential infrastructures of artificial intelligence. It is a challenge for knowledge engineering to construct a high-quality domain knowledge graph for multi-source heterogeneous data. We propose a complete process framework for constructing a knowledge graph that combines structured data and unstructured data, which includes data processing, information extraction, knowledge fusion, data storage, and update strategies, aiming to improve the quality of the knowledge graph and extend its life cycle. Specifically, we take the construction process of an enterprise knowledge graph as an example and integrate enterprise register information, litigation-related information, and enterprise announcement information to enrich the enterprise knowledge graph. For the unstructured text, we improve existing model to extract triples and the F1-score of our model reached 72.77%. The number of nodes and edges in our constructed enterprise knowledge graph reaches 1,430,000 and 3,170,000, respectively. Furthermore, for each type of multi-source heterogeneous data, we apply corresponding methods and strategies for information extraction and data storage and carry out a detailed comparative analysis of graph databases. From the perspective of practical use, the informative enterprise knowledge graph and its timely update can serve many actual business needs. Our proposed enterprise knowledge graph has been deployed in HuaRong RongTong (Beijing) Technology Co., Ltd. and is used by the staff as a powerful tool for corporate due diligence. The key features are reported and analyzed in the case study. Overall, this paper provides an easy-to-follow solution and practice for domain knowledge graph construction, as well as demonstrating its application in corporate due diligence.

摘要

知识图谱是人工智能的重要基础设施之一。为多源异构数据构建高质量的领域知识图谱是知识工程面临的一项挑战。我们提出了一个完整的知识图谱构建流程框架,该框架结合了结构化数据和非结构化数据,包括数据处理、信息抽取、知识融合、数据存储和更新策略,旨在提高知识图谱的质量并延长其生命周期。具体而言,我们以企业知识图谱的构建过程为例,整合企业注册信息、诉讼相关信息和企业公告信息,以丰富企业知识图谱。对于非结构化文本,我们改进了现有模型以提取三元组,我们模型的F1分数达到了72.77%。我们构建的企业知识图谱中的节点和边的数量分别达到了143万个和317万条。此外,对于每种类型的多源异构数据,我们应用相应的信息抽取和数据存储方法及策略,并对图数据库进行了详细的比较分析。从实际应用的角度来看,内容丰富的企业知识图谱及其及时更新可以满足许多实际业务需求。我们提出的企业知识图谱已在华融融通(北京)科技有限公司部署,并被员工用作企业尽职调查的有力工具。在案例研究中报告并分析了其关键特性。总体而言,本文为领域知识图谱的构建提供了一个易于遵循的解决方案和实践,并展示了其在企业尽职调查中的应用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/51df/10569599/6796ae2835a3/fdata-06-1278153-g0009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/51df/10569599/e540fe0b6072/fdata-06-1278153-g0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/51df/10569599/ccf51c6d4e89/fdata-06-1278153-g0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/51df/10569599/b6498a6c8adb/fdata-06-1278153-g0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/51df/10569599/ea29eeea3ba6/fdata-06-1278153-g0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/51df/10569599/f89596cf57ac/fdata-06-1278153-g0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/51df/10569599/992843008280/fdata-06-1278153-g0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/51df/10569599/997098c18465/fdata-06-1278153-g0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/51df/10569599/bdefd0600c8a/fdata-06-1278153-g0008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/51df/10569599/6796ae2835a3/fdata-06-1278153-g0009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/51df/10569599/e540fe0b6072/fdata-06-1278153-g0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/51df/10569599/ccf51c6d4e89/fdata-06-1278153-g0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/51df/10569599/b6498a6c8adb/fdata-06-1278153-g0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/51df/10569599/ea29eeea3ba6/fdata-06-1278153-g0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/51df/10569599/f89596cf57ac/fdata-06-1278153-g0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/51df/10569599/992843008280/fdata-06-1278153-g0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/51df/10569599/997098c18465/fdata-06-1278153-g0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/51df/10569599/bdefd0600c8a/fdata-06-1278153-g0008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/51df/10569599/6796ae2835a3/fdata-06-1278153-g0009.jpg

相似文献

1
A solution and practice for combining multi-source heterogeneous data to construct enterprise knowledge graph.一种融合多源异构数据以构建企业知识图谱的解决方案与实践。
Front Big Data. 2023 Sep 28;6:1278153. doi: 10.3389/fdata.2023.1278153. eCollection 2023.
2
Construction and application of Chinese breast cancer knowledge graph based on multi-source heterogeneous data.基于多源异质数据的中文乳腺癌知识图谱构建与应用。
Math Biosci Eng. 2023 Feb 6;20(4):6776-6799. doi: 10.3934/mbe.2023292.
3
Research on enterprise knowledge service based on semantic reasoning and data fusion.基于语义推理和数据融合的企业知识服务研究
Neural Comput Appl. 2022;34(12):9455-9470. doi: 10.1007/s00521-021-06382-z. Epub 2021 Aug 24.
4
Head and Tail Entity Fusion Model in Medical Knowledge Graph Construction: Case Study for Pituitary Adenoma.医学知识图谱构建中的头尾实体融合模型:垂体腺瘤案例研究
JMIR Med Inform. 2021 Jul 22;9(7):e28218. doi: 10.2196/28218.
5
Enhancing Enterprise Credit Risk Assessment with Cascaded Multi-level Graph Representation Learning.利用级联多层次图表示学习增强企业信用风险评估。
Neural Netw. 2024 Jan;169:475-484. doi: 10.1016/j.neunet.2023.10.050. Epub 2023 Nov 3.
6
Automatic knowledge extraction from Chinese electronic medical records and rheumatoid arthritis knowledge graph construction.从中国电子病历中自动提取知识并构建类风湿性关节炎知识图谱。
Quant Imaging Med Surg. 2023 Jun 1;13(6):3873-3890. doi: 10.21037/qims-22-1158. Epub 2023 May 8.
7
Multi-source heterogeneous blockchain data quality assessment model for enterprise business activities.面向企业业务活动的多源异质区块链数据质量评估模型。
PLoS One. 2024 Jun 14;19(6):e0304835. doi: 10.1371/journal.pone.0304835. eCollection 2024.
8
Causal knowledge graph construction and evaluation for clinical decision support of diabetic nephropathy.用于糖尿病肾病临床决策支持的因果知识图谱构建与评估
J Biomed Inform. 2023 Mar;139:104298. doi: 10.1016/j.jbi.2023.104298. Epub 2023 Jan 30.
9
Construction of a Digestive System Tumor Knowledge Graph Based on Chinese Electronic Medical Records: Development and Usability Study.基于中文电子病历的消化系统肿瘤知识图谱构建:开发与可用性研究
JMIR Med Inform. 2020 Oct 7;8(10):e18287. doi: 10.2196/18287.
10
KGHC: a knowledge graph for hepatocellular carcinoma.KGHC:用于肝细胞癌的知识图谱。
BMC Med Inform Decis Mak. 2020 Jul 9;20(Suppl 3):135. doi: 10.1186/s12911-020-1112-5.

引用本文的文献

1
Multi-Source Feature-Fusion Method for the Seismic Data of Cultural Relics Based on Deep Learning.基于深度学习的文物地震数据多源特征融合方法
Sensors (Basel). 2024 Jul 12;24(14):4525. doi: 10.3390/s24144525.

本文引用的文献

1
Shall I Work with Them? A Knowledge Graph-Based Approach for Predicting Future Research Collaborations.我应该与他们合作吗?一种基于知识图谱的预测未来研究合作的方法。
Entropy (Basel). 2021 May 25;23(6):664. doi: 10.3390/e23060664.
2
Proving the Correctness of Knowledge Graph Update: A Scenario From Surveillance of Adverse Childhood Experiences.证明知识图谱更新的正确性:来自童年不良经历监测的一个场景
Front Big Data. 2021 May 3;4:660101. doi: 10.3389/fdata.2021.660101. eCollection 2021.
3
Real-world data medical knowledge graph: construction and applications.
真实世界数据医疗知识图谱:构建与应用。
Artif Intell Med. 2020 Mar;103:101817. doi: 10.1016/j.artmed.2020.101817. Epub 2020 Feb 6.
4
DrugBank 5.0: a major update to the DrugBank database for 2018.DrugBank 5.0:2018 年 DrugBank 数据库的重大更新。
Nucleic Acids Res. 2018 Jan 4;46(D1):D1074-D1082. doi: 10.1093/nar/gkx1037.