College of Computer Science and Technology, Dalian University of Technology, Dalian, 116024, China.
Beijing Institute of Health Administration and Medical Information, Beijing, 100850, China.
BMC Med Inform Decis Mak. 2020 Jul 9;20(Suppl 3):135. doi: 10.1186/s12911-020-1112-5.
Hepatocellular carcinoma is one of the most general malignant neoplasms in adults with high mortality. Mining relative medical knowledge from rapidly growing text data and integrating it with other existing biomedical resources will provide support to the research on the hepatocellular carcinoma. To this purpose, we constructed a knowledge graph for Hepatocellular Carcinoma (KGHC).
We propose an approach to build a knowledge graph for hepatocellular carcinoma. Specifically, we first extracted knowledge from structured data and unstructured data. Since the extracted entities may contain some noise, we applied a biomedical information extraction system, named BioIE, to filter the data in KGHC. Then we introduced a fusion method which is used to fuse the extracted data. Finally, we stored the data into the Neo4j which can help researchers analyze the network of hepatocellular carcinoma.
KGHC contains 13,296 triples and provides the knowledge of hepatocellular carcinoma for healthcare professionals, making them free of digging into a large amount of biomedical literatures. This could hopefully improve the efficiency of researches on the hepatocellular carcinoma. KGHC is accessible free for academic research purpose at http://202.118.75.18:18895/browser/ .
In this paper, we present a knowledge graph associated with hepatocellular carcinoma, which is constructed with vast amounts of structured and unstructured data. The evaluation results show that the data in KGHC is of high quality.
肝细胞癌是成年人中最常见的恶性肿瘤之一,死亡率很高。从快速增长的文本数据中挖掘相关医学知识,并将其与其他现有的生物医学资源相结合,将为肝细胞癌的研究提供支持。为此,我们构建了一个肝细胞癌知识图谱(KGHC)。
我们提出了一种构建肝细胞癌知识图谱的方法。具体来说,我们首先从结构化数据和非结构化数据中提取知识。由于提取的实体可能包含一些噪声,因此我们应用了名为 BioIE 的生物医学信息提取系统来过滤 KGHC 中的数据。然后,我们引入了一种融合方法,用于融合提取的数据。最后,我们将数据存储到 Neo4j 中,这有助于研究人员分析肝细胞癌的网络。
KGHC 包含 13296 个三元组,为医疗保健专业人员提供了肝细胞癌的知识,使他们无需深入挖掘大量的生物医学文献。这有望提高肝细胞癌研究的效率。KGHC 可在 http://202.118.75.18:18895/browser/ 上免费用于学术研究目的。
本文提出了一个与肝细胞癌相关的知识图谱,它是由大量的结构化和非结构化数据构建而成。评估结果表明,KGHC 中的数据质量很高。