• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

从电子健康记录中提取家族病史信息:自然语言处理分析

Extracting Family History Information From Electronic Health Records: Natural Language Processing Analysis.

作者信息

Rybinski Maciej, Dai Xiang, Singh Sonit, Karimi Sarvnaz, Nguyen Anthony

机构信息

Commonwealth Scientific and Industrial Research Organisation, Sydney, Australia.

University of Sydney, Sydney, Australia.

出版信息

JMIR Med Inform. 2021 Apr 30;9(4):e24020. doi: 10.2196/24020.

DOI:10.2196/24020
PMID:33664015
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8092929/
Abstract

BACKGROUND

The prognosis, diagnosis, and treatment of many genetic disorders and familial diseases significantly improve if the family history (FH) of a patient is known. Such information is often written in the free text of clinical notes.

OBJECTIVE

The aim of this study is to develop automated methods that enable access to FH data through natural language processing.

METHODS

We performed information extraction by using transformers to extract disease mentions from notes. We also experimented with rule-based methods for extracting family member (FM) information from text and coreference resolution techniques. We evaluated different transfer learning strategies to improve the annotation of diseases. We provided a thorough error analysis of the contributing factors that affect such information extraction systems.

RESULTS

Our experiments showed that the combination of domain-adaptive pretraining and intermediate-task pretraining achieved an F1 score of 81.63% for the extraction of diseases and FMs from notes when it was tested on a public shared task data set from the National Natural Language Processing Clinical Challenges (N2C2), providing a statistically significant improvement over the baseline (P<.001). In comparison, in the 2019 N2C2/Open Health Natural Language Processing Shared Task, the median F1 score of all 17 participating teams was 76.59%.

CONCLUSIONS

Our approach, which leverages a state-of-the-art named entity recognition model for disease mention detection coupled with a hybrid method for FM mention detection, achieved an effectiveness that was close to that of the top 3 systems participating in the 2019 N2C2 FH extraction challenge, with only the top system convincingly outperforming our approach in terms of precision.

摘要

背景

如果已知患者的家族病史(FH),许多遗传疾病和家族性疾病的预后、诊断和治疗将得到显著改善。此类信息通常写在临床记录的自由文本中。

目的

本研究的目的是开发通过自然语言处理获取FH数据的自动化方法。

方法

我们使用变压器从记录中提取疾病提及来进行信息提取。我们还试验了基于规则的方法从文本中提取家庭成员(FM)信息以及共指消解技术。我们评估了不同的迁移学习策略以改善疾病标注。我们对影响此类信息提取系统的因素进行了全面的错误分析。

结果

我们的实验表明,当在来自国家自然语言处理临床挑战(N2C2)的公共共享任务数据集上进行测试时,领域自适应预训练和中间任务预训练的组合在从记录中提取疾病和FM方面的F1分数达到了81.63%,与基线相比有统计学显著改善(P<.001)。相比之下,在2019年N2C2/开放健康自然语言处理共享任务中,所有17个参与团队的中位数F1分数为76.59%。

结论

我们的方法利用最先进的命名实体识别模型进行疾病提及检测,并结合混合方法进行FM提及检测,其有效性接近参与2019年N2C2 FH提取挑战的前3个系统,只有顶级系统在精度方面明显优于我们的方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7a89/8092929/7086e2db24da/medinform_v9i4e24020_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7a89/8092929/398be964ea6f/medinform_v9i4e24020_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7a89/8092929/31136309967e/medinform_v9i4e24020_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7a89/8092929/7086e2db24da/medinform_v9i4e24020_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7a89/8092929/398be964ea6f/medinform_v9i4e24020_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7a89/8092929/31136309967e/medinform_v9i4e24020_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7a89/8092929/7086e2db24da/medinform_v9i4e24020_fig3.jpg

相似文献

1
Extracting Family History Information From Electronic Health Records: Natural Language Processing Analysis.从电子健康记录中提取家族病史信息:自然语言处理分析
JMIR Med Inform. 2021 Apr 30;9(4):e24020. doi: 10.2196/24020.
2
Family History Extraction From Synthetic Clinical Narratives Using Natural Language Processing: Overview and Evaluation of a Challenge Data Set and Solutions for the 2019 National NLP Clinical Challenges (n2c2)/Open Health Natural Language Processing (OHNLP) Competition.利用自然语言处理从合成临床叙述中提取家族病史:2019年国家自然语言处理临床挑战(n2c2)/开放健康自然语言处理(OHNLP)竞赛的挑战数据集概述与评估及解决方案
JMIR Med Inform. 2021 Jan 27;9(1):e24008. doi: 10.2196/24008.
3
A Hybrid Model for Family History Information Identification and Relation Extraction: Development and Evaluation of an End-to-End Information Extraction System.一种用于家族病史信息识别与关系抽取的混合模型:一个端到端信息抽取系统的开发与评估
JMIR Med Inform. 2021 Apr 22;9(4):e22797. doi: 10.2196/22797.
4
The 2019 n2c2/OHNLP Track on Clinical Semantic Textual Similarity: Overview.2019年n2c2/OHNLP临床语义文本相似性赛道:概述
JMIR Med Inform. 2020 Nov 27;8(11):e23375. doi: 10.2196/23375.
5
Acquisition of a Lexicon for Family History Information: Bidirectional Encoder Representations From Transformers-Assisted Sublanguage Analysis.用于家族病史信息的词汇获取:基于Transformer辅助子语言分析的双向编码器表征
JMIR Med Inform. 2023 Jun 27;11:e48072. doi: 10.2196/48072.
6
Task definition, annotated dataset, and supervised natural language processing models for symptom extraction from unstructured clinical notes.从非结构化临床记录中提取症状的任务定义、标注数据集和监督自然语言处理模型。
J Biomed Inform. 2020 Feb;102:103354. doi: 10.1016/j.jbi.2019.103354. Epub 2019 Dec 12.
7
The 2019 National Natural language processing (NLP) Clinical Challenges (n2c2)/Open Health NLP (OHNLP) shared task on clinical concept normalization for clinical records.2019 年全国自然语言处理(NLP)临床挑战(n2c2)/开放健康自然语言处理(OHNLP)临床记录临床概念规范化共享任务。
J Am Med Inform Assoc. 2020 Oct 1;27(10):1529-1537. doi: 10.1093/jamia/ocaa106.
8
Extraction of Family History Information From Clinical Notes: Deep Learning and Heuristics Approach.从临床记录中提取家族病史信息:深度学习与启发式方法。
JMIR Med Inform. 2020 Dec 29;8(12):e22898. doi: 10.2196/22898.
9
Extraction of Information Related to Drug Safety Surveillance From Electronic Health Record Notes: Joint Modeling of Entities and Relations Using Knowledge-Aware Neural Attentive Models.从电子健康记录笔记中提取与药物安全监测相关的信息:使用知识感知神经注意力模型对实体和关系进行联合建模
JMIR Med Inform. 2020 Jul 10;8(7):e18417. doi: 10.2196/18417.
10
The 2022 n2c2/UW shared task on extracting social determinants of health.2022 年 n2c2/UW 关于提取健康社会决定因素的共享任务。
J Am Med Inform Assoc. 2023 Jul 19;30(8):1367-1378. doi: 10.1093/jamia/ocad012.

引用本文的文献

1
Improving Clinical Documentation with Artificial Intelligence: A Systematic Review.利用人工智能改善临床文档记录:一项系统综述。
Perspect Health Inf Manag. 2024 Jun 1;21(2):1d. eCollection 2024 Summer-Fall.
2
Enhancing patient representation learning with inferred family pedigrees improves disease risk prediction.利用推断的家族谱系增强患者表征学习可改善疾病风险预测。
J Am Med Inform Assoc. 2025 Mar 1;32(3):435-446. doi: 10.1093/jamia/ocae297.
3
Internet-Based Abnormal Chromosomal Diagnosis During Pregnancy Using a Noninvasive Innovative Approach to Detecting Chromosomal Abnormalities in the Fetus: Scoping Review.

本文引用的文献

1
A survey of word embeddings for clinical text.临床文本词嵌入研究
J Biomed Inform. 2019;100S:100057. doi: 10.1016/j.yjbinx.2019.100057. Epub 2019 Oct 28.
2
Novel Graph-Based Model With Biaffine Attention for Family History Extraction From Clinical Text: Modeling Study.基于双仿射注意力的新型图模型用于从临床文本中提取家族病史:建模研究
JMIR Med Inform. 2021 Apr 21;9(4):e23587. doi: 10.2196/23587.
3
Family History Extraction From Synthetic Clinical Narratives Using Natural Language Processing: Overview and Evaluation of a Challenge Data Set and Solutions for the 2019 National NLP Clinical Challenges (n2c2)/Open Health Natural Language Processing (OHNLP) Competition.
基于互联网的孕期染色体异常诊断:使用无创创新方法检测胎儿染色体异常的范围综述
JMIR Bioinform Biotechnol. 2024 Oct 16;5:e58439. doi: 10.2196/58439.
4
Use of Machine Learning Tools in Evidence Synthesis of Tobacco Use Among Sexual and Gender Diverse Populations: Algorithm Development and Validation.机器学习工具在性取向和性别多样化人群烟草使用证据综合中的应用:算法开发与验证
JMIR Form Res. 2024 Jan 24;8:e49031. doi: 10.2196/49031.
5
Deployment of Real-time Natural Language Processing and Deep Learning Clinical Decision Support in the Electronic Health Record: Pipeline Implementation for an Opioid Misuse Screener in Hospitalized Adults.电子健康记录中实时自然语言处理和深度学习临床决策支持的应用:成年住院患者阿片类药物滥用筛查器的流程实施
JMIR Med Inform. 2023 Apr 20;11:e44977. doi: 10.2196/44977.
6
Clinician documentation of patient centered care in the electronic health record.临床医生在电子健康记录中记录以患者为中心的护理。
BMC Med Inform Decis Mak. 2022 Mar 12;22(1):65. doi: 10.1186/s12911-022-01794-w.
7
Comparison of a Focused Family Cancer History Questionnaire to Family History Documentation in the Electronic Medical Record.聚焦家族癌症史问卷与电子病历中家族病史记录的比较。
J Prim Care Community Health. 2022 Jan-Dec;13:21501319211069756. doi: 10.1177/21501319211069756.
8
The development of a machine learning algorithm to identify occupational injuries in agriculture using pre-hospital care reports.开发一种利用院前护理报告识别农业职业伤害的机器学习算法。
Health Inf Sci Syst. 2021 Jul 29;9(1):31. doi: 10.1007/s13755-021-00161-9. eCollection 2021 Dec.
利用自然语言处理从合成临床叙述中提取家族病史:2019年国家自然语言处理临床挑战(n2c2)/开放健康自然语言处理(OHNLP)竞赛的挑战数据集概述与评估及解决方案
JMIR Med Inform. 2021 Jan 27;9(1):e24008. doi: 10.2196/24008.
4
Family History Information Extraction With Neural Attention and an Enhanced Relation-Side Scheme: Algorithm Development and Validation.基于神经注意力和增强关系侧方案的家族病史信息提取:算法开发与验证
JMIR Med Inform. 2020 Dec 1;8(12):e21750. doi: 10.2196/21750.
5
Selected articles from the BioCreative/OHNLP challenge 2018.2018年生物创意/OHNLP挑战赛精选文章。
BMC Med Inform Decis Mak. 2019 Dec 27;19(Suppl 10):262. doi: 10.1186/s12911-019-0994-6.
6
Family history information extraction via deep joint learning.通过深度联合学习提取家族史信息。
BMC Med Inform Decis Mak. 2019 Dec 27;19(Suppl 10):277. doi: 10.1186/s12911-019-0995-5.
7
Family member information extraction via neural sequence labeling models with different tag schemes.基于不同标记方案的神经序列标记模型的家庭成员信息抽取。
BMC Med Inform Decis Mak. 2019 Dec 27;19(Suppl 10):257. doi: 10.1186/s12911-019-0996-4.
8
BioBERT: a pre-trained biomedical language representation model for biomedical text mining.BioBERT:一种用于生物医学文本挖掘的预训练生物医学语言表示模型。
Bioinformatics. 2020 Feb 15;36(4):1234-1240. doi: 10.1093/bioinformatics/btz682.
9
Adapting Word Embeddings from Multiple Domains to Symptom Recognition from Psychiatric Notes.将多领域词嵌入应用于精神科病历症状识别
AMIA Jt Summits Transl Sci Proc. 2018 May 18;2017:281-289. eCollection 2018.
10
Clinical information extraction applications: A literature review.临床信息提取应用:文献综述。
J Biomed Inform. 2018 Jan;77:34-49. doi: 10.1016/j.jbi.2017.11.011. Epub 2017 Nov 21.