通过挖掘网络构建以糖尿病为中心的知识库。

On building a diabetes centric knowledge base via mining the web.

机构信息

Shanghai Shuguang Hospital Affiliated to Shanghai University of Traditional Chinese Medicine, Pu'an Road, Shanghai, China.

Shanghai Leyan Technologies Co. Ltd, No. 1028 Panyu Road, Shanghai, China.

出版信息

BMC Med Inform Decis Mak. 2019 Apr 9;19(Suppl 2):49. doi: 10.1186/s12911-019-0771-6.

DOI:10.1186/s12911-019-0771-6

PMID:30961582

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6454670/

Abstract

BACKGROUND

Diabetes has become one of the hot topics in life science researches. To support the analytical procedures, researchers and analysts expend a mass of labor cost to collect experimental data, which is also error-prone. To reduce the cost and to ensure the data quality, there is a growing trend of extracting clinical events in form of knowledge from electronic medical records (EMRs). To do so, we first need a high-coverage knowledge base (KB) of a specific disease to support the above extraction tasks called KB-based Extraction.

METHODS

We propose an approach to build a diabetes-centric knowledge base (a.k.a. DKB) via mining the Web. In particular, we first extract knowledge from semi-structured contents of vertical portals, fuse individual knowledge from each site, and further map them to a unified KB. The target DKB is then extracted from the overall KB based on a distance-based Expectation-Maximization (EM) algorithm.

RESULTS

During the experiments, we selected eight popular vertical portals in China as data sources to construct DKB. There are 7703 instances and 96,041 edges in the final diabetes KB covering diseases, symptoms, western medicines, traditional Chinese medicines, examinations, departments, and body structures. The accuracy of DKB is 95.91%. Besides the quality assessment of extracted knowledge from vertical portals, we also carried out detailed experiments for evaluating the knowledge fusion performance as well as the convergence of the distance-based EM algorithm with positive results.

CONCLUSIONS

In this paper, we introduced an approach to constructing DKB. A knowledge extraction and fusion pipeline was first used to extract semi-structured data from vertical portals and individual KBs were further fused into a unified knowledge base. After that, we develop a distance based Expectation Maximization algorithm to extract a subset from the overall knowledge base forming the target DKB. Experiments showed that the data in DKB are rich and of high-quality.

摘要

背景

糖尿病已成为生命科学研究中的热门话题之一。为了支持分析过程，研究人员和分析人员花费大量的劳动成本来收集实验数据，这也容易出错。为了降低成本并确保数据质量，越来越倾向于从电子病历（EMR）中以知识的形式提取临床事件。为此，我们首先需要一个特定疾病的高覆盖率知识库（KB）来支持上述提取任务，称为基于 KB 的提取。

方法

我们提出了一种通过挖掘网络构建以糖尿病为中心的知识库（即 DKB）的方法。具体来说，我们首先从垂直门户的半结构化内容中提取知识，融合每个站点的个体知识，并进一步将其映射到统一的 KB。然后，基于基于距离的期望最大化（EM）算法从总体 KB 中提取目标 DKB。

结果

在实验过程中，我们选择了中国的八个流行的垂直门户作为数据源来构建 DKB。最终的糖尿病 KB 包含疾病、症状、西药、中药、检查、科室和身体结构，共有 7703 个实例和 96041 条边。DKB 的准确率为 95.91%。除了评估从垂直门户提取的知识的质量外，我们还进行了详细的实验，以评估知识融合性能以及基于距离的 EM 算法的收敛性，结果均为正。

结论

本文介绍了一种构建 DKB 的方法。首先使用知识提取和融合管道从垂直门户中提取半结构化数据，然后将各个 KB 进一步融合到统一的知识库中。之后，我们开发了一种基于距离的期望最大化算法，从整体知识库中提取一个子集，形成目标 DKB。实验表明，DKB 中的数据丰富且质量高。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3458/6454670/bc4d122f276c/12911_2019_771_Fig1_HTML.jpg

相似文献

On building a diabetes centric knowledge base via mining the web.

BMC Med Inform Decis Mak. 2019 Apr 9;19(Suppl 2):49. doi: 10.1186/s12911-019-0771-6.

An automatic approach for constructing a knowledge base of symptoms in Chinese.

J Biomed Semantics. 2017 Sep 20;8(Suppl 1):33. doi: 10.1186/s13326-017-0145-x.

[A customized method for information extraction from unstructured text data in the electronic medical records].

Beijing Da Xue Xue Bao Yi Xue Ban. 2018 Apr 18;50(2):256-263.

KnowLife: a versatile approach for constructing a large knowledge graph for biomedical sciences.

BMC Bioinformatics. 2015 May 14;16:157. doi: 10.1186/s12859-015-0549-5.

PotatoG-DKB: a potato gene-disease knowledge base mined from biological literature.

PeerJ. 2024 Oct 3;12:e18202. doi: 10.7717/peerj.18202. eCollection 2024.

OC-2-KB: integrating crowdsourcing into an obesity and cancer knowledge base curation system.

BMC Med Inform Decis Mak. 2018 Jul 23;18(Suppl 2):55. doi: 10.1186/s12911-018-0635-5.

The Voice of Chinese Health Consumers: A Text Mining Approach to Web-Based Physician Reviews.

J Med Internet Res. 2016 May 10;18(5):e108. doi: 10.2196/jmir.4430.

Extracting information from the text of electronic medical records to improve case detection: a systematic review.

J Am Med Inform Assoc. 2016 Sep;23(5):1007-15. doi: 10.1093/jamia/ocv180. Epub 2016 Feb 5.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

Implementation and evaluation of a multivariate abstraction-based, interval-based dynamic time-warping method as a similarity measure for longitudinal medical records.

J Biomed Inform. 2021 Nov;123:103919. doi: 10.1016/j.jbi.2021.103919. Epub 2021 Oct 8.

引用本文的文献

Information Extraction from the Text Data on Traditional Chinese Medicine: A Review on Tasks, Challenges, and Methods from 2010 to 2021.

Evid Based Complement Alternat Med. 2022 May 13;2022:1679589. doi: 10.1155/2022/1679589. eCollection 2022.

Construction of a Linked Data Set of COVID-19 Knowledge Graphs: Development and Applications.

JMIR Med Inform. 2022 May 13;10(5):e37215. doi: 10.2196/37215.

Head and Tail Entity Fusion Model in Medical Knowledge Graph Construction: Case Study for Pituitary Adenoma.

JMIR Med Inform. 2021 Jul 22;9(7):e28218. doi: 10.2196/28218.

Knowledge-Based Biomedical Data Science.

Annu Rev Biomed Data Sci. 2020 Jul;3:23-41. doi: 10.1146/annurev-biodatasci-010820-091627. Epub 2020 Apr 7.

Medical Knowledge Graph to Enhance Fraud, Waste, and Abuse Detection on Claim Data: Model Development and Performance Evaluation.

JMIR Med Inform. 2020 Jul 23;8(7):e17653. doi: 10.2196/17653.

Artificial Intelligence and Big Data in Diabetes Care: A Position Statement of the Italian Association of Medical Diabetologists.

J Med Internet Res. 2020 Jun 22;22(6):e16922. doi: 10.2196/16922.

本文引用的文献

Learning a Health Knowledge Graph from Electronic Medical Records.

Sci Rep. 2017 Jul 20;7(1):5994. doi: 10.1038/s41598-017-05778-z.

The SIDER database of drugs and side effects.

Nucleic Acids Res. 2016 Jan 4;44(D1):D1075-9. doi: 10.1093/nar/gkv1075. Epub 2015 Oct 19.

DAPD: A Knowledgebase for Diabetes Associated Proteins.

IEEE/ACM Trans Comput Biol Bioinform. 2015 May-Jun;12(3):604-10. doi: 10.1109/TCBB.2014.2359442.

KnowLife: a versatile approach for constructing a large knowledge graph for biomedical sciences.

BMC Bioinformatics. 2015 May 14;16:157. doi: 10.1186/s12859-015-0549-5.

Semantic framework for mapping object-oriented model to semantic web languages.

Front Neuroinform. 2015 Feb 25;9:3. doi: 10.3389/fninf.2015.00003. eCollection 2015.

Self-supervised Chinese ontology learning from online encyclopedias.

ScientificWorldJournal. 2014 Mar 13;2014:848631. doi: 10.1155/2014/848631. eCollection 2014.

DrugBank 4.0: shedding new light on drug metabolism.

Nucleic Acids Res. 2014 Jan;42(Database issue):D1091-7. doi: 10.1093/nar/gkt1068. Epub 2013 Nov 6.

The Diabetes Self-Management Questionnaire (DSMQ): development and evaluation of an instrument to assess diabetes self-care activities associated with glycaemic control.

Health Qual Life Outcomes. 2013 Aug 13;11:138. doi: 10.1186/1477-7525-11-138.

T2D@ZJU: a knowledgebase integrating heterogeneous connections associated with type 2 diabetes mellitus.

Database (Oxford). 2013 Jul 11;2013:bat052. doi: 10.1093/database/bat052. Print 2013.

Electronic health records and quality of diabetes care.

N Engl J Med. 2011 Sep 1;365(9):825-33. doi: 10.1056/NEJMsa1102519.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

通过挖掘网络构建以糖尿病为中心的知识库。

On building a diabetes centric knowledge base via mining the web.

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献