• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于电子健康记录中的结构化和非结构化家庭健康史数据识别符合遗传性癌症基因检测标准的患者:自然语言处理方法

Identifying Patients Who Meet Criteria for Genetic Testing of Hereditary Cancers Based on Structured and Unstructured Family Health History Data in the Electronic Health Record: Natural Language Processing Approach.

作者信息

Shi Jianlin, Morgan Keaton L, Bradshaw Richard L, Jung Se-Hee, Kohlmann Wendy, Kaphingst Kimberly A, Kawamoto Kensaku, Fiol Guilherme Del

机构信息

Veterans Affairs Informatics and Computing Infrastructure, Department of Veterans Affairs Salt Lake City Health Care System, Salt Lake City, UT, United States.

Division of Epidemiology, Department of Internal Medicine, School of Medicine, University of Utah, Salt Lake City, UT, United States.

出版信息

JMIR Med Inform. 2022 Aug 11;10(8):e37842. doi: 10.2196/37842.

DOI:10.2196/37842
PMID:35969459
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9412758/
Abstract

BACKGROUND

Family health history has been recognized as an essential factor for cancer risk assessment and is an integral part of many cancer screening guidelines, including genetic testing for personalized clinical management strategies. However, manually identifying eligible candidates for genetic testing is labor intensive.

OBJECTIVE

The aim of this study was to develop a natural language processing (NLP) pipeline and assess its contribution to identifying patients who meet genetic testing criteria for hereditary cancers based on family health history data in the electronic health record (EHR). We compared an algorithm that uses structured data alone with structured data augmented using NLP.

METHODS

Algorithms were developed based on the National Comprehensive Cancer Network (NCCN) guidelines for genetic testing for hereditary breast, ovarian, pancreatic, and colorectal cancers. The NLP-augmented algorithm uses both structured family health history data and the associated unstructured free-text comments. The algorithms were compared with a reference standard of 100 patients with a family health history in the EHR.

RESULTS

Regarding identifying the reference standard patients meeting the NCCN criteria, the NLP-augmented algorithm compared with the structured data algorithm yielded a significantly higher recall of 0.95 (95% CI 0.9-0.99) versus 0.29 (95% CI 0.19-0.40) and a precision of 0.99 (95% CI 0.96-1.00) versus 0.81 (95% CI 0.65-0.95). On the whole data set, the NLP-augmented algorithm extracted 33.6% more entities, resulting in 53.8% more patients meeting the NCCN criteria.

CONCLUSIONS

Compared with the structured data algorithm, the NLP-augmented algorithm based on both structured and unstructured family health history data in the EHR increased the number of patients identified as meeting the NCCN criteria for genetic testing for hereditary breast or ovarian and colorectal cancers.

摘要

背景

家族健康史已被视为癌症风险评估的重要因素,并且是许多癌症筛查指南的一个组成部分,包括用于个性化临床管理策略的基因检测。然而,手动识别基因检测的合格候选人需要耗费大量人力。

目的

本研究的目的是开发一种自然语言处理(NLP)流程,并评估其对基于电子健康记录(EHR)中的家族健康史数据识别符合遗传性癌症基因检测标准的患者的贡献。我们将仅使用结构化数据的算法与使用NLP增强的结构化数据算法进行了比较。

方法

根据美国国立综合癌症网络(NCCN)关于遗传性乳腺癌、卵巢癌、胰腺癌和结直肠癌基因检测的指南开发算法。NLP增强算法同时使用结构化家族健康史数据和相关的非结构化自由文本注释。将这些算法与EHR中100名有家族健康史患者的参考标准进行比较。

结果

在识别符合NCCN标准的参考标准患者方面,与结构化数据算法相比,NLP增强算法的召回率显著更高,分别为0.95(95%CI 0.9 - 0.99)和0.29(95%CI 0.19 - 0.40),精度分别为0.99(95%CI 0.96 - 1.00)和0.81(95%CI 0.65 - 0.95)。在整个数据集上,NLP增强算法提取的实体多33.6%,导致符合NCCN标准的患者多53.8%。

结论

与结构化数据算法相比,基于EHR中结构化和非结构化家族健康史数据的NLP增强算法增加了被识别为符合遗传性乳腺癌或卵巢癌以及结直肠癌基因检测NCCN标准的患者数量。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf19/9412758/c09ff95923ea/medinform_v10i8e37842_fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf19/9412758/9b8ded7f0a79/medinform_v10i8e37842_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf19/9412758/09b7e0fa068f/medinform_v10i8e37842_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf19/9412758/79458d36a4c9/medinform_v10i8e37842_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf19/9412758/c09ff95923ea/medinform_v10i8e37842_fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf19/9412758/9b8ded7f0a79/medinform_v10i8e37842_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf19/9412758/09b7e0fa068f/medinform_v10i8e37842_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf19/9412758/79458d36a4c9/medinform_v10i8e37842_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf19/9412758/c09ff95923ea/medinform_v10i8e37842_fig4.jpg

相似文献

1
Identifying Patients Who Meet Criteria for Genetic Testing of Hereditary Cancers Based on Structured and Unstructured Family Health History Data in the Electronic Health Record: Natural Language Processing Approach.基于电子健康记录中的结构化和非结构化家庭健康史数据识别符合遗传性癌症基因检测标准的患者:自然语言处理方法
JMIR Med Inform. 2022 Aug 11;10(8):e37842. doi: 10.2196/37842.
2
Enhanced family history-based algorithms increase the identification of individuals meeting criteria for genetic testing of hereditary cancer syndromes but would not reduce disparities on their own.基于家族史的强化算法可以增加符合遗传性癌症综合征基因检测标准的个体的识别率,但仅凭这些算法无法减少差异。
J Biomed Inform. 2024 Jan;149:104568. doi: 10.1016/j.jbi.2023.104568. Epub 2023 Dec 9.
3
Identification of pancreatic cancer risk factors from clinical notes using natural language processing.利用自然语言处理从临床记录中识别胰腺癌风险因素。
Pancreatology. 2024 Jun;24(4):572-578. doi: 10.1016/j.pan.2024.03.016. Epub 2024 Mar 26.
4
Designing an openEHR-Based Pipeline for Extracting and Standardizing Unstructured Clinical Data Using Natural Language Processing.设计一个基于 openEHR 的管道,使用自然语言处理提取和标准化非结构化临床数据。
Methods Inf Med. 2020 Dec;59(S 02):e64-e78. doi: 10.1055/s-0040-1716403. Epub 2020 Oct 14.
5
Determining Onset for Familial Breast and Colorectal Cancer from Family History Comments in the Electronic Health Record.根据电子健康记录中的家族史注释确定家族性乳腺癌和结直肠癌的发病时间。
AMIA Jt Summits Transl Sci Proc. 2019 May 6;2019:173-181. eCollection 2019.
6
The Value of Unstructured Electronic Health Record Data in Geriatric Syndrome Case Identification.非结构化电子健康记录数据在老年综合征病例识别中的价值。
J Am Geriatr Soc. 2018 Aug;66(8):1499-1507. doi: 10.1111/jgs.15411. Epub 2018 Jul 4.
7
Augmented intelligence with natural language processing applied to electronic health records for identifying patients with non-alcoholic fatty liver disease at risk for disease progression.应用自然语言处理的增强型人工智能用于电子健康记录,以识别非酒精性脂肪性肝病患者中疾病进展风险较高的患者。
Int J Med Inform. 2019 Sep;129:334-341. doi: 10.1016/j.ijmedinf.2019.06.028. Epub 2019 Jul 6.
8
Identifying Information Gaps in Electronic Health Records by Using Natural Language Processing: Gynecologic Surgery History Identification.利用自然语言处理识别电子健康记录中的信息空白:妇科手术史识别。
J Med Internet Res. 2022 Jan 28;24(1):e29015. doi: 10.2196/29015.
9
Using natural language processing to identify opioid use disorder in electronic health record data.利用自然语言处理技术在电子健康记录数据中识别阿片类药物使用障碍。
Int J Med Inform. 2023 Feb;170:104963. doi: 10.1016/j.ijmedinf.2022.104963. Epub 2022 Dec 10.
10
Natural language processing to identify lupus nephritis phenotype in electronic health records.利用自然语言处理技术在电子健康记录中识别狼疮性肾炎表型。
BMC Med Inform Decis Mak. 2024 Mar 3;22(Suppl 2):348. doi: 10.1186/s12911-024-02420-7.

引用本文的文献

1
Clinical applications of large language models in medicine and surgery: A scoping review.大型语言模型在医学与外科中的临床应用:一项范围综述
J Int Med Res. 2025 Jul;53(7):3000605251347556. doi: 10.1177/03000605251347556. Epub 2025 Jul 4.
2
A foundation systematic review of natural language processing applied to gastroenterology & hepatology.一项关于应用于胃肠病学和肝病学的自然语言处理的基础系统评价。
BMC Gastroenterol. 2025 Feb 6;25(1):58. doi: 10.1186/s12876-025-03608-5.
3
Validation of a guidelines-based digital tool to assess the need for germline cancer genetic testing.

本文引用的文献

1
Physicians' strategies for using family history data: having the data is not the same as using the data.医生使用家族病史数据的策略:拥有数据并不等同于使用数据。
JAMIA Open. 2020 Oct 8;3(3):378-385. doi: 10.1093/jamiaopen/ooaa035. eCollection 2020 Oct.
2
Comparing models of delivery for cancer genetics services among patients receiving primary care who meet criteria for genetic evaluation in two healthcare systems: BRIDGE randomized controlled trial.比较在两个医疗保健系统中符合遗传评估标准的初级保健患者中提供癌症遗传学服务的交付模式:BRIDGE 随机对照试验。
BMC Health Serv Res. 2021 Jun 2;21(1):542. doi: 10.1186/s12913-021-06489-y.
3
用于评估遗传性癌症基因检测需求的基于指南的数字工具的验证
Hered Cancer Clin Pract. 2024 Nov 8;22(1):24. doi: 10.1186/s13053-024-00298-0.
4
Screening Familial Risk for Hereditary Breast and Ovarian Cancer.筛查遗传性乳腺癌和卵巢癌的家族风险。
JAMA Netw Open. 2024 Sep 3;7(9):e2435901. doi: 10.1001/jamanetworkopen.2024.35901.
5
A Clinical Prediction Model to Assess Risk for Pancreatic Cancer Among Patients With Acute Pancreatitis.急性胰腺炎患者胰腺癌风险评估的临床预测模型。
Pancreas. 2024 Mar 1;53(3):e254-e259. doi: 10.1097/MPA.0000000000002295. Epub 2024 Jan 25.
6
Enhanced family history-based algorithms increase the identification of individuals meeting criteria for genetic testing of hereditary cancer syndromes but would not reduce disparities on their own.基于家族史的强化算法可以增加符合遗传性癌症综合征基因检测标准的个体的识别率,但仅凭这些算法无法减少差异。
J Biomed Inform. 2024 Jan;149:104568. doi: 10.1016/j.jbi.2023.104568. Epub 2023 Dec 9.
Family History Extraction From Synthetic Clinical Narratives Using Natural Language Processing: Overview and Evaluation of a Challenge Data Set and Solutions for the 2019 National NLP Clinical Challenges (n2c2)/Open Health Natural Language Processing (OHNLP) Competition.
利用自然语言处理从合成临床叙述中提取家族病史:2019年国家自然语言处理临床挑战(n2c2)/开放健康自然语言处理(OHNLP)竞赛的挑战数据集概述与评估及解决方案
JMIR Med Inform. 2021 Jan 27;9(1):e24008. doi: 10.2196/24008.
4
Extracting Family History of Patients From Clinical Narratives: Exploring an End-to-End Solution With Deep Learning Models.从临床叙述中提取患者家族病史:使用深度学习模型探索端到端解决方案
JMIR Med Inform. 2020 Dec 15;8(12):e22982. doi: 10.2196/22982.
5
NCCN Guidelines Insights: Genetic/Familial High-Risk Assessment: Breast, Ovarian, and Pancreatic, Version 1.2020.NCCN 指南解读:遗传/家族性高风险评估:乳腺、卵巢和胰腺,第 1.2020 版。
J Natl Compr Canc Netw. 2020 Apr;18(4):380-391. doi: 10.6004/jnccn.2020.0017.
6
Standards-Based Clinical Decision Support Platform to Manage Patients Who Meet Guideline-Based Criteria for Genetic Evaluation of Familial Cancer.基于标准的临床决策支持平台,用于管理符合基于指南的家族性癌症遗传评估标准的患者。
JCO Clin Cancer Inform. 2020 Jan;4:1-9. doi: 10.1200/CCI.19.00120.
7
NCCN Guidelines Insights: Genetic/Familial High-Risk Assessment: Colorectal, Version 2.2019.NCCN 指南解读:遗传/家族性高风险评估:结直肠癌,第 2.2019 版。
J Natl Compr Canc Netw. 2019 Sep 1;17(9):1032-1041. doi: 10.6004/jnccn.2019.0044.
8
Determining Onset for Familial Breast and Colorectal Cancer from Family History Comments in the Electronic Health Record.根据电子健康记录中的家族史注释确定家族性乳腺癌和结直肠癌的发病时间。
AMIA Jt Summits Transl Sci Proc. 2019 May 6;2019:173-181. eCollection 2019.
9
Trie-based rule processing for clinical NLP: A use-case study of n-trie, making the ConText algorithm more efficient and scalable.基于 Trie 的规则处理在临床自然语言处理中的应用:n-trie 的使用案例研究,使 ConText 算法更高效、更具可扩展性。
J Biomed Inform. 2018 Sep;85:106-113. doi: 10.1016/j.jbi.2018.08.002. Epub 2018 Aug 6.
10
Exploring Gaps of Family History Documentation in EHR for Precision Medicine -A Case Study of Familial Hypercholesterolemia Ascertainment.探索电子健康记录中家族病史文档在精准医学方面的差距——以家族性高胆固醇血症确诊为例的研究
AMIA Jt Summits Transl Sci Proc. 2016 Jul 20;2016:160-6. eCollection 2016.