利用命名实体识别和分布语义模型挖掘临床文本中的心脏病风险因素。

Mining heart disease risk factors in clinical text with named entity recognition and distributional semantic models.

作者信息

Urbain Jay

机构信息

Milwaukee School of Engineering, Milwaukee, WI, United States; CTSI of SE Wisconsin/Medical College of Wisconsin, Milwaukee, WI, United States.

出版信息

J Biomed Inform. 2015 Dec;58 Suppl(Suppl):S143-S149. doi: 10.1016/j.jbi.2015.08.009. Epub 2015 Aug 21.

DOI:10.1016/j.jbi.2015.08.009

PMID:26305514

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4984540/

Abstract

We present the design, and analyze the performance of a multi-stage natural language processing system employing named entity recognition, Bayesian statistics, and rule logic to identify and characterize heart disease risk factor events in diabetic patients over time. The system was originally developed for the 2014 i2b2 Challenges in Natural Language in Clinical Data. The system's strengths included a high level of accuracy for identifying named entities associated with heart disease risk factor events. The system's primary weakness was due to inaccuracies when characterizing the attributes of some events. For example, determining the relative time of an event with respect to the record date, whether an event is attributable to the patient's history or the patient's family history, and differentiating between current and prior smoking status. We believe these inaccuracies were due in large part to the lack of an effective approach for integrating context into our event detection model. To address these inaccuracies, we explore the addition of a distributional semantic model for characterizing contextual evidence of heart disease risk factor events. Using this semantic model, we raise our initial 2014 i2b2 Challenges in Natural Language of Clinical data F1 score of 0.838 to 0.890 and increased precision by 10.3% without use of any lexicons that might bias our results.

摘要

我们展示了一个多阶段自然语言处理系统的设计，并分析了其性能。该系统采用命名实体识别、贝叶斯统计和规则逻辑，用于随时间识别和表征糖尿病患者的心脏病风险因素事件。该系统最初是为2014年i2b2临床数据自然语言挑战而开发的。该系统的优势包括在识别与心脏病风险因素事件相关的命名实体方面具有较高的准确性。该系统的主要弱点是在表征某些事件的属性时存在不准确之处。例如，确定事件相对于记录日期的相对时间，事件是归因于患者的病史还是家族病史，以及区分当前和以前的吸烟状态。我们认为这些不准确之处在很大程度上是由于缺乏一种将上下文整合到我们的事件检测模型中的有效方法。为了解决这些不准确之处，我们探索添加一种分布语义模型来表征心脏病风险因素事件的上下文证据。使用这种语义模型，我们将2014年i2b2临床数据自然语言挑战中的初始F1分数从0.838提高到0.890，并且在不使用任何可能使结果产生偏差的词典的情况下，精度提高了10.3%。

相似文献

Mining heart disease risk factors in clinical text with named entity recognition and distributional semantic models.

J Biomed Inform. 2015 Dec;58 Suppl(Suppl):S143-S149. doi: 10.1016/j.jbi.2015.08.009. Epub 2015 Aug 21.

Using local lexicalized rules to identify heart disease risk factors in clinical notes.

J Biomed Inform. 2015 Dec;58 Suppl(Suppl):S183-S188. doi: 10.1016/j.jbi.2015.06.013. Epub 2015 Jun 29.

An automatic system to identify heart disease risk factors in clinical texts over time.

J Biomed Inform. 2015 Dec;58 Suppl(Suppl):S158-S163. doi: 10.1016/j.jbi.2015.09.002. Epub 2015 Sep 8.

Agile text mining for the 2014 i2b2/UTHealth Cardiac risk factors challenge.

J Biomed Inform. 2015 Dec;58 Suppl(0):S120-S127. doi: 10.1016/j.jbi.2015.06.030. Epub 2015 Jul 22.

Combining glass box and black box evaluations in the identification of heart disease risk factors and their temporal relations from clinical records.

J Biomed Inform. 2015 Dec;58 Suppl(Suppl):S133-S142. doi: 10.1016/j.jbi.2015.06.014. Epub 2015 Jul 2.

Adapting existing natural language processing resources for cardiovascular risk factors identification in clinical notes.

J Biomed Inform. 2015 Dec;58 Suppl(Suppl):S128-S132. doi: 10.1016/j.jbi.2015.08.002. Epub 2015 Aug 28.

Coronary artery disease risk assessment from unstructured electronic health records using text mining.

J Biomed Inform. 2015 Dec;58 Suppl(Suppl):S203-S210. doi: 10.1016/j.jbi.2015.08.003. Epub 2015 Aug 28.

Risk factor detection for heart disease by applying text analytics in electronic medical records.

J Biomed Inform. 2015 Dec;58 Suppl(Suppl):S164-S170. doi: 10.1016/j.jbi.2015.08.011. Epub 2015 Aug 14.

The role of fine-grained annotations in supervised recognition of risk factors for heart disease from EHRs.

J Biomed Inform. 2015 Dec;58 Suppl(Suppl):S111-S119. doi: 10.1016/j.jbi.2015.06.010. Epub 2015 Jun 26.

A context-aware approach for progression tracking of medical concepts in electronic medical records.

J Biomed Inform. 2015 Dec;58 Suppl(Suppl):S150-S157. doi: 10.1016/j.jbi.2015.09.013. Epub 2015 Sep 30.

引用本文的文献

The validity of electronic health data for measuring smoking status: a systematic review and meta-analysis.

BMC Med Inform Decis Mak. 2024 Feb 2;24(1):33. doi: 10.1186/s12911-024-02416-3.

Using Natural Language Processing to Measure and Improve Quality of Diabetes Care: A Systematic Review.

J Diabetes Sci Technol. 2021 May;15(3):553-560. doi: 10.1177/19322968211000831. Epub 2021 Mar 19.

Clinical concept extraction: A methodology review.

J Biomed Inform. 2020 Sep;109:103526. doi: 10.1016/j.jbi.2020.103526. Epub 2020 Aug 6.

Systematic Evaluation of Research Progress on Natural Language Processing in Medicine Over the Past 20 Years: Bibliometric Study on PubMed.

J Med Internet Res. 2020 Jan 23;22(1):e16816. doi: 10.2196/16816.

Assessing Information Congruence of Documented Cardiovascular Disease between Electronic Dental and Medical Records.

AMIA Annu Symp Proc. 2018 Dec 5;2018:1442-1450. eCollection 2018.

Feature extraction for phenotyping from semantic and knowledge resources.

J Biomed Inform. 2019 Mar;91:103122. doi: 10.1016/j.jbi.2019.103122. Epub 2019 Feb 7.

Biomedical Text Categorization Based on Ensemble Pruning and Optimized Topic Modelling.

Comput Math Methods Med. 2018 Jul 22;2018:2497471. doi: 10.1155/2018/2497471. eCollection 2018.

Combining information from a clinical data warehouse and a pharmaceutical database to generate a framework to detect comorbidities in electronic health records.

BMC Med Inform Decis Mak. 2018 Jan 24;18(1):9. doi: 10.1186/s12911-018-0586-x.

A new synonym-substitution method to enrich the human phenotype ontology.

BMC Bioinformatics. 2017 Oct 10;18(1):446. doi: 10.1186/s12859-017-1858-7.

Automatic prediction of coronary artery disease from clinical narratives.

J Biomed Inform. 2017 Aug;72:23-32. doi: 10.1016/j.jbi.2017.06.019. Epub 2017 Jun 27.

本文引用的文献

Practical applications for natural language processing in clinical research: The 2014 i2b2/UTHealth shared tasks.

J Biomed Inform. 2015 Dec;58 Suppl(Suppl):S1-S5. doi: 10.1016/j.jbi.2015.10.007. Epub 2015 Oct 24.

Identifying synonymy between SNOMED clinical terms of varying length using distributional analysis of electronic health records.

AMIA Annu Symp Proc. 2013 Nov 16;2013:600-9. eCollection 2013.

Semantic interoperation and electronic health records: context sensitive mapping from SNOMED CT to ICD-10.

Stud Health Technol Inform. 2013;192:603-7.

Passage relevance models for genomics search.

BMC Bioinformatics. 2009 Mar 19;10 Suppl 3(Suppl 3):S3. doi: 10.1186/1471-2105-10-S3-S3.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

利用命名实体识别和分布语义模型挖掘临床文本中的心脏病风险因素。

Mining heart disease risk factors in clinical text with named entity recognition and distributional semantic models.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献