利用自然语言处理从临床记录中识别胰腺癌风险因素。

Identification of pancreatic cancer risk factors from clinical notes using natural language processing.

机构信息

Department of Gastroenterology and Hepatology, Mayo Clinic, Rochester, MN, USA.

Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN, USA.

出版信息

Pancreatology. 2024 Jun;24(4):572-578. doi: 10.1016/j.pan.2024.03.016. Epub 2024 Mar 26.

DOI:10.1016/j.pan.2024.03.016

PMID:38693040

Abstract

OBJECTIVES

Screening for pancreatic ductal adenocarcinoma (PDAC) is considered in high-risk individuals (HRIs) with established PDAC risk factors, such as family history and germline mutations in PDAC susceptibility genes. Accurate assessment of risk factor status is provider knowledge-dependent and requires extensive manual chart review by experts. Natural Language Processing (NLP) has shown promise in automated data extraction from the electronic health record (EHR). We aimed to use NLP for automated extraction of PDAC risk factors from unstructured clinical notes in the EHR.

METHODS

We first developed rule-based NLP algorithms to extract PDAC risk factors at the document-level, using an annotated corpus of 2091 clinical notes. Next, we further improved the NLP algorithms using a cohort of 1138 patients through patient-level training, validation, and testing, with comparison against a pre-specified reference standard. To minimize false-negative results we prioritized algorithm recall.

RESULTS

In the test set (n = 807), the NLP algorithms achieved a recall of 0.933, precision of 0.790, and F-score of 0.856 for family history of PDAC. For germline genetic mutations, the algorithm had a high recall of 0.851, while precision and F-score were lower at 0.350 and 0.496 respectively. Most false positives for germline mutations resulted from erroneous recognition of tissue mutations.

CONCLUSIONS

Rule-based NLP algorithms applied to unstructured clinical notes are highly sensitive for automated identification of PDAC risk factors. Further validation in a large primary-care patient population is warranted to assess real-world utility in identifying HRIs for pancreatic cancer screening.

摘要

目的

在具有已确定的胰腺导管腺癌 (PDAC) 风险因素的高危个体 (HRIs) 中，考虑进行 PDAC 筛查，例如家族史和 PDAC 易感性基因的种系突变。风险因素状态的准确评估依赖于提供者的知识，需要专家进行广泛的手动图表审查。自然语言处理 (NLP) 已显示出从电子健康记录 (EHR) 中自动提取数据的潜力。我们旨在使用 NLP 从 EHR 中的非结构化临床记录中自动提取 PDAC 风险因素。

方法

我们首先开发了基于规则的 NLP 算法，以在文档级别提取 PDAC 风险因素，使用 2091 份临床记录的注释语料库。接下来，我们通过 1138 名患者的患者级培训、验证和测试，进一步改进了 NLP 算法，并与预定义的参考标准进行了比较，以最小化假阴性结果。为了最大限度地提高算法的召回率，我们优先考虑了算法的召回率。

结果

在测试集中（n=807），NLP 算法对 PDAC 家族史的召回率为 0.933，精度为 0.790，F1 得分为 0.856。对于种系基因突变，该算法的召回率很高，为 0.851，而精度和 F1 得分分别较低，为 0.350 和 0.496。种系突变的大多数假阳性结果是由于错误识别组织突变所致。

结论

应用于非结构化临床记录的基于规则的 NLP 算法对 PDAC 风险因素的自动识别具有很高的敏感性。需要在大型初级保健患者人群中进一步验证，以评估其在识别胰腺癌筛查高危个体方面的实际应用。

相似文献

Identification of pancreatic cancer risk factors from clinical notes using natural language processing.利用自然语言处理从临床记录中识别胰腺癌风险因素。

Pancreatology. 2024 Jun;24(4):572-578. doi: 10.1016/j.pan.2024.03.016. Epub 2024 Mar 26.

Extraction of sleep information from clinical notes of Alzheimer's disease patients using natural language processing.使用自然语言处理从阿尔茨海默病患者的临床记录中提取睡眠信息。

J Am Med Inform Assoc. 2024 Oct 1;31(10):2217-2227. doi: 10.1093/jamia/ocae177.

Automatically identifying social isolation from clinical narratives for patients with prostate Cancer.自动识别前列腺癌患者临床叙述中的社会孤立现象。

BMC Med Inform Decis Mak. 2019 Mar 14;19(1):43. doi: 10.1186/s12911-019-0795-y.

Identifying Asthma-Related Symptoms From Electronic Health Records Using a Hybrid Natural Language Processing Approach Within a Large Integrated Health Care System: Retrospective Study.在大型综合医疗保健系统中使用混合自然语言处理方法从电子健康记录中识别哮喘相关症状：回顾性研究

JMIR AI. 2025 May 2;4:e69132. doi: 10.2196/69132.

Development and Validation of a Rule-Based Natural Language Processing Algorithm to Identify Falls in Inpatient Records of Older Adults: Retrospective Analysis.用于识别老年人住院记录中跌倒事件的基于规则的自然语言处理算法的开发与验证：回顾性分析

JMIR Aging. 2025 Jul 8;8:e65195. doi: 10.2196/65195.

Language Models for Multilabel Document Classification of Surgical Concepts in Exploratory Laparotomy Operative Notes: Algorithm Development Study.用于探索性剖腹手术记录中手术概念多标签文档分类的语言模型：算法开发研究

JMIR Med Inform. 2025 Jul 9;13:e71176. doi: 10.2196/71176.

Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中，如果患者出现以下症状和体征，可判断其是否患有 COVID-19。

Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.

Identifying Diabetes Related-Complications in a Real-World Free-Text Electronic Medical Records in Hebrew Using Natural Language Processing Techniques.使用自然语言处理技术在真实世界的希伯来语自由文本电子病历中识别糖尿病相关并发症。

J Diabetes Sci Technol. 2024 Jan 30:19322968241228555. doi: 10.1177/19322968241228555.

Artificial intelligence for diagnosing exudative age-related macular degeneration.人工智能在渗出性年龄相关性黄斑变性诊断中的应用。

Cochrane Database Syst Rev. 2024 Oct 17;10(10):CD015522. doi: 10.1002/14651858.CD015522.pub2.

Toward Cross-Hospital Deployment of Natural Language Processing Systems: Model Development and Validation of Fine-Tuned Large Language Models for Disease Name Recognition in Japanese.迈向自然语言处理系统的跨医院部署：用于日语疾病名称识别的微调大语言模型的模型开发与验证

JMIR Med Inform. 2025 Jul 8;13:e76773. doi: 10.2196/76773.

引用本文的文献

The exposome as a target for primary prevention and a tool for early detection of pancreatic cancer.暴露组作为胰腺癌一级预防的靶点和早期检测的工具。

Best Pract Res Clin Gastroenterol. 2025 Feb;74:101991. doi: 10.1016/j.bpg.2025.101991. Epub 2025 Feb 15.

本文引用的文献

A deep learning algorithm to predict risk of pancreatic cancer from disease trajectories.一种基于深度学习算法的胰腺癌风险预测方法。

Nat Med. 2023 May;29(5):1113-1122. doi: 10.1038/s41591-023-02332-5. Epub 2023 May 8.

The Multicenter Cancer of Pancreas Screening Study: Impact on Stage and Survival.多中心胰腺癌筛查研究：对分期和生存的影响。

J Clin Oncol. 2022 Oct 1;40(28):3257-3266. doi: 10.1200/JCO.22.00298. Epub 2022 Jun 15.

ASGE guideline on screening for pancreatic cancer in individuals with genetic susceptibility: summary and recommendations.美国胃肠内镜学会关于对具有遗传易感性个体进行胰腺癌筛查的指南：总结与建议

Gastrointest Endosc. 2022 May;95(5):817-826. doi: 10.1016/j.gie.2021.12.001. Epub 2022 Feb 16.

Early Detection of Pancreatic Cancer: Applying Artificial Intelligence to Electronic Health Records.早期胰腺癌检测：将人工智能应用于电子健康记录。

Pancreas. 2021 Aug 1;50(7):916-922. doi: 10.1097/MPA.0000000000001882.

Clinical Data Prediction Model to Identify Patients With Early-Stage Pancreatic Cancer.临床数据预测模型，用于识别早期胰腺癌患者。

JCO Clin Cancer Inform. 2021 Mar;5:279-287. doi: 10.1200/CCI.20.00137.

Cancer Statistics, 2021.癌症统计数据，2021.

CA Cancer J Clin. 2021 Jan;71(1):7-33. doi: 10.3322/caac.21654. Epub 2021 Jan 12.

Development and validation of a pancreatic cancer risk model for the general population using electronic health records: An observational study.利用电子健康记录为一般人群开发和验证胰腺癌风险模型：一项观察性研究。

Eur J Cancer. 2021 Jan;143:19-30. doi: 10.1016/j.ejca.2020.10.019. Epub 2020 Dec 2.

Worldwide Burden of, Risk Factors for, and Trends in Pancreatic Cancer.全球胰腺癌负担、风险因素及趋势。

Gastroenterology. 2021 Feb;160(3):744-754. doi: 10.1053/j.gastro.2020.10.007. Epub 2020 Oct 13.

Thyroid Ultrasound Reports: Will the Thyroid Imaging, Reporting, and Data System Improve Natural Language Processing Capture of Critical Thyroid Nodule Features?甲状腺超声报告：甲状腺成像报告和数据系统是否会改善关键甲状腺结节特征的自然语言处理捕获？

J Surg Res. 2020 Dec;256:557-563. doi: 10.1016/j.jss.2020.07.015. Epub 2020 Aug 13.

Clinical concept extraction: A methodology review.临床概念提取：方法学综述。

J Biomed Inform. 2020 Sep;109:103526. doi: 10.1016/j.jbi.2020.103526. Epub 2020 Aug 6.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

利用自然语言处理从临床记录中识别胰腺癌风险因素。

Identification of pancreatic cancer risk factors from clinical notes using natural language processing.

机构信息

出版信息

OBJECTIVES

METHODS

RESULTS

CONCLUSIONS

目的

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献