• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用自然语言处理从合成临床叙述中提取家族病史:2019年国家自然语言处理临床挑战(n2c2)/开放健康自然语言处理(OHNLP)竞赛的挑战数据集概述与评估及解决方案

Family History Extraction From Synthetic Clinical Narratives Using Natural Language Processing: Overview and Evaluation of a Challenge Data Set and Solutions for the 2019 National NLP Clinical Challenges (n2c2)/Open Health Natural Language Processing (OHNLP) Competition.

作者信息

Shen Feichen, Liu Sijia, Fu Sunyang, Wang Yanshan, Henry Sam, Uzuner Ozlem, Liu Hongfang

机构信息

Division of Digital Health Sciences, Mayo Clinic, Rochester, MN, United States.

Department of Information Sciences and Technology, George Mason University, Fairfax, VA, United States.

出版信息

JMIR Med Inform. 2021 Jan 27;9(1):e24008. doi: 10.2196/24008.

DOI:10.2196/24008
PMID:33502329
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7875692/
Abstract

BACKGROUND

As a risk factor for many diseases, family history (FH) captures both shared genetic variations and living environments among family members. Though there are several systems focusing on FH extraction using natural language processing (NLP) techniques, the evaluation protocol of such systems has not been standardized.

OBJECTIVE

The n2c2/OHNLP (National NLP Clinical Challenges/Open Health Natural Language Processing) 2019 FH extraction task aims to encourage the community efforts on a standard evaluation and system development on FH extraction from synthetic clinical narratives.

METHODS

We organized the first BioCreative/OHNLP FH extraction shared task in 2018. We continued the shared task in 2019 in collaboration with the n2c2 and OHNLP consortium, and organized the 2019 n2c2/OHNLP FH extraction track. The shared task comprises 2 subtasks. Subtask 1 focuses on identifying family member entities and clinical observations (diseases), and subtask 2 expects the association of the living status, side of the family, and clinical observations with family members to be extracted. Subtask 2 is an end-to-end task which is based on the result of subtask 1. We manually curated the first deidentified clinical narrative from FH sections of clinical notes at Mayo Clinic Rochester, the content of which is highly relevant to patients' FH.

RESULTS

A total of 17 teams from all over the world participated in the n2c2/OHNLP FH extraction shared task, where 38 runs were submitted for subtask 1 and 21 runs were submitted for subtask 2. For subtask 1, the top 3 runs were generated by Harbin Institute of Technology, ezDI, Inc., and The Medical University of South Carolina with F1 scores of 0.8745, 0.8225, and 0.8130, respectively. For subtask 2, the top 3 runs were from Harbin Institute of Technology, ezDI, Inc., and University of Florida with F1 scores of 0.681, 0.6586, and 0.6544, respectively. The workshop was held in conjunction with the AMIA 2019 Fall Symposium.

CONCLUSIONS

A wide variety of methods were used by different teams in both tasks, such as Bidirectional Encoder Representations from Transformers, convolutional neural network, bidirectional long short-term memory, conditional random field, support vector machine, and rule-based strategies. System performances show that relation extraction from FH is a more challenging task when compared to entity identification task.

摘要

背景

家族史(FH)作为多种疾病的风险因素,反映了家庭成员之间共享的基因变异和生活环境。尽管有多个系统致力于使用自然语言处理(NLP)技术提取家族史,但此类系统的评估协议尚未标准化。

目的

2019年n2c2/OHNLP(国家NLP临床挑战/开放健康自然语言处理)家族史提取任务旨在鼓励社区致力于对从合成临床叙述中提取家族史进行标准评估和系统开发。

方法

我们在2018年组织了首届BioCreative/OHNLP家族史提取共享任务。2019年,我们与n2c2和OHNLP联盟合作继续开展共享任务,并组织了2019年n2c2/OHNLP家族史提取赛道。该共享任务包括2个子任务。子任务1专注于识别家庭成员实体和临床观察结果(疾病),子任务2期望提取生活状况、家族分支以及临床观察结果与家庭成员之间的关联。子任务2是一个基于子任务1结果的端到端任务。我们从梅奥诊所罗切斯特分院临床笔记的家族史部分手动整理了第一份去标识化临床叙述,其内容与患者的家族史高度相关。

结果

来自世界各地的17个团队参加了n2c2/OHNLP家族史提取共享任务,其中为子任务1提交了38次运行结果,为子任务2提交了21次运行结果。对于子任务1,排名前三的运行结果分别由哈尔滨工业大学、ezDI公司和南卡罗来纳医科大学生成,F1分数分别为0.8745、0.8225和0.8130。对于子任务2,排名前三的运行结果来自哈尔滨工业大学、ezDI公司和佛罗里达大学,F1分数分别为0.681、0.6586和0.6544。该研讨会与2019年美国医学信息学会秋季研讨会同期举行。

结论

不同团队在这两个任务中使用了多种方法,如基于变换器的双向编码器表征、卷积神经网络、双向长短期记忆、条件随机场、支持向量机和基于规则的策略。系统性能表明,与实体识别任务相比,从家族史中提取关系是一项更具挑战性的任务。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4389/7875692/d078abd05071/medinform_v9i1e24008_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4389/7875692/960398e904c4/medinform_v9i1e24008_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4389/7875692/d078abd05071/medinform_v9i1e24008_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4389/7875692/960398e904c4/medinform_v9i1e24008_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4389/7875692/d078abd05071/medinform_v9i1e24008_fig2.jpg

相似文献

1
Family History Extraction From Synthetic Clinical Narratives Using Natural Language Processing: Overview and Evaluation of a Challenge Data Set and Solutions for the 2019 National NLP Clinical Challenges (n2c2)/Open Health Natural Language Processing (OHNLP) Competition.利用自然语言处理从合成临床叙述中提取家族病史:2019年国家自然语言处理临床挑战(n2c2)/开放健康自然语言处理(OHNLP)竞赛的挑战数据集概述与评估及解决方案
JMIR Med Inform. 2021 Jan 27;9(1):e24008. doi: 10.2196/24008.
2
The 2019 n2c2/OHNLP Track on Clinical Semantic Textual Similarity: Overview.2019年n2c2/OHNLP临床语义文本相似性赛道:概述
JMIR Med Inform. 2020 Nov 27;8(11):e23375. doi: 10.2196/23375.
3
Prescription of Controlled Substances: Benefits and Risks管制药品的处方:益处与风险
4
The 2022 n2c2/UW shared task on extracting social determinants of health.2022 年 n2c2/UW 关于提取健康社会决定因素的共享任务。
J Am Med Inform Assoc. 2023 Jul 19;30(8):1367-1378. doi: 10.1093/jamia/ocad012.
5
Falls prevention interventions for community-dwelling older adults: systematic review and meta-analysis of benefits, harms, and patient values and preferences.社区居住的老年人跌倒预防干预措施:系统评价和荟萃分析的益处、危害以及患者的价值观和偏好。
Syst Rev. 2024 Nov 26;13(1):289. doi: 10.1186/s13643-024-02681-3.
6
AI in Medical Questionnaires: Innovations, Diagnosis, and Implications.医学问卷中的人工智能:创新、诊断及影响
J Med Internet Res. 2025 Jun 23;27:e72398. doi: 10.2196/72398.
7
Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.两种现代生存预测工具 SORG-MLA 和 METSSS 在接受手术联合放疗和单纯放疗治疗有症状长骨转移患者中的比较。
Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.
8
Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中,如果患者出现以下症状和体征,可判断其是否患有 COVID-19。
Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.
9
Short-Term Memory Impairment短期记忆障碍
10
MarkVCID cerebral small vessel consortium: I. Enrollment, clinical, fluid protocols.马克 VCID 脑小血管联盟:一、入组、临床、液体方案。
Alzheimers Dement. 2021 Apr;17(4):704-715. doi: 10.1002/alz.12215. Epub 2021 Jan 21.

引用本文的文献

1
A Clinical Prediction Model to Assess Risk for Pancreatic Cancer Among Patients With Acute Pancreatitis.急性胰腺炎患者胰腺癌风险评估的临床预测模型。
Pancreas. 2024 Mar 1;53(3):e254-e259. doi: 10.1097/MPA.0000000000002295. Epub 2024 Jan 25.
2
Genomic formation of Tibeto-Burman speaking populations in Guizhou, Southwest China.中国西南贵州地区说藏缅语族群的基因组形成。
BMC Genomics. 2023 Nov 7;24(1):672. doi: 10.1186/s12864-023-09767-7.
3
Acquisition of a Lexicon for Family History Information: Bidirectional Encoder Representations From Transformers-Assisted Sublanguage Analysis.

本文引用的文献

1
Clinical Text Data in Machine Learning: Systematic Review.机器学习中的临床文本数据:系统综述
JMIR Med Inform. 2020 Mar 31;8(3):e17984. doi: 10.2196/17984.
2
Selected articles from the BioCreative/OHNLP challenge 2018.2018年生物创意/OHNLP挑战赛精选文章。
BMC Med Inform Decis Mak. 2019 Dec 27;19(Suppl 10):262. doi: 10.1186/s12911-019-0994-6.
3
Natural Language Processing of Clinical Notes on Chronic Diseases: Systematic Review.慢性病临床记录的自然语言处理:系统综述
用于家族病史信息的词汇获取:基于Transformer辅助子语言分析的双向编码器表征
JMIR Med Inform. 2023 Jun 27;11:e48072. doi: 10.2196/48072.
4
Artificial intelligence based health indicator extraction and disease symptoms identification using medical hypothesis models.基于人工智能,利用医学假设模型提取健康指标并识别疾病症状。
Cluster Comput. 2022 Aug 23:1-13. doi: 10.1007/s10586-022-03697-x.
5
Identifying Patients Who Meet Criteria for Genetic Testing of Hereditary Cancers Based on Structured and Unstructured Family Health History Data in the Electronic Health Record: Natural Language Processing Approach.基于电子健康记录中的结构化和非结构化家庭健康史数据识别符合遗传性癌症基因检测标准的患者:自然语言处理方法
JMIR Med Inform. 2022 Aug 11;10(8):e37842. doi: 10.2196/37842.
6
A scoping review of publicly available language tasks in clinical natural language processing.临床自然语言处理中公开可用语言任务的范围综述
J Am Med Inform Assoc. 2022 Sep 12;29(10):1797-1806. doi: 10.1093/jamia/ocac127.
7
Comparison of a Focused Family Cancer History Questionnaire to Family History Documentation in the Electronic Medical Record.聚焦家族癌症史问卷与电子病历中家族病史记录的比较。
J Prim Care Community Health. 2022 Jan-Dec;13:21501319211069756. doi: 10.1177/21501319211069756.
8
Health Natural Language Processing: Methodology Development and Applications.健康自然语言处理:方法学发展与应用
JMIR Med Inform. 2021 Oct 21;9(10):e23898. doi: 10.2196/23898.
9
A Hybrid Model for Family History Information Identification and Relation Extraction: Development and Evaluation of an End-to-End Information Extraction System.一种用于家族病史信息识别与关系抽取的混合模型:一个端到端信息抽取系统的开发与评估
JMIR Med Inform. 2021 Apr 22;9(4):e22797. doi: 10.2196/22797.
10
Extracting Family History Information From Electronic Health Records: Natural Language Processing Analysis.从电子健康记录中提取家族病史信息:自然语言处理分析
JMIR Med Inform. 2021 Apr 30;9(4):e24020. doi: 10.2196/24020.
JMIR Med Inform. 2019 Apr 27;7(2):e12239. doi: 10.2196/12239.
4
Clinical information extraction applications: A literature review.临床信息提取应用:文献综述。
J Biomed Inform. 2018 Jan;77:34-49. doi: 10.1016/j.jbi.2017.11.011. Epub 2017 Nov 21.
5
Anafora: A Web-based General Purpose Annotation Tool.Anafora:一个基于网络的通用注释工具。
Proc Conf. 2013 Jun;2013:14-19.
6
Systematic Analysis of Free-Text Family History in Electronic Health Record.电子健康记录中自由文本家族病史的系统分析
AMIA Jt Summits Transl Sci Proc. 2017 Jul 26;2017:104-113. eCollection 2017.
7
Improving a full-text search engine: the importance of negation detection and family history context to identify cases in a biomedical data warehouse.改进全文搜索引擎:否定检测和家族病史背景对在生物医学数据仓库中识别病例的重要性。
J Am Med Inform Assoc. 2017 May 1;24(3):607-613. doi: 10.1093/jamia/ocw144.
8
Assessing the state of the art in biomedical relation extraction: overview of the BioCreative V chemical-disease relation (CDR) task.评估生物医学关系抽取的技术现状:生物创意V化学-疾病关系(CDR)任务概述。
Database (Oxford). 2016 Mar 19;2016. doi: 10.1093/database/baw032. Print 2016.
9
Automated extraction of family history information from clinical notes.从临床记录中自动提取家族病史信息。
AMIA Annu Symp Proc. 2014 Nov 14;2014:1709-17. eCollection 2014.
10
Evaluating the state of the art in coreference resolution for electronic medical records.评估电子病历中核心参考解析的最新技术水平。
J Am Med Inform Assoc. 2012 Sep-Oct;19(5):786-91. doi: 10.1136/amiajnl-2011-000784. Epub 2012 Feb 24.