Suppr超能文献

利用自然语言处理提高研究中手动图表提取的效率:以乳腺癌复发为例。

Using natural language processing to improve efficiency of manual chart abstraction in research: the case of breast cancer recurrence.

出版信息

Am J Epidemiol. 2014 Mar 15;179(6):749-58. doi: 10.1093/aje/kwt441. Epub 2014 Jan 30.

Abstract

The increasing availability of electronic health records (EHRs) creates opportunities for automated extraction of information from clinical text. We hypothesized that natural language processing (NLP) could substantially reduce the burden of manual abstraction in studies examining outcomes, like cancer recurrence, that are documented in unstructured clinical text, such as progress notes, radiology reports, and pathology reports. We developed an NLP-based system using open-source software to process electronic clinical notes from 1995 to 2012 for women with early-stage incident breast cancers to identify whether and when recurrences were diagnosed. We developed and evaluated the system using clinical notes from 1,472 patients receiving EHR-documented care in an integrated health care system in the Pacific Northwest. A separate study provided the patient-level reference standard for recurrence status and date. The NLP-based system correctly identified 92% of recurrences and estimated diagnosis dates within 30 days for 88% of these. Specificity was 96%. The NLP-based system overlooked 5 of 65 recurrences, 4 because electronic documents were unavailable. The NLP-based system identified 5 other recurrences incorrectly classified as nonrecurrent in the reference standard. If used in similar cohorts, NLP could reduce by 90% the number of EHR charts abstracted to identify confirmed breast cancer recurrence cases at a rate comparable to traditional abstraction.

摘要

电子健康记录(EHR)的日益普及为从临床文本中自动提取信息创造了机会。我们假设自然语言处理(NLP)可以大大减轻手动提取信息的负担,这些信息是在未结构化的临床文本中记录的,例如进展记录、放射学报告和病理学报告,用于研究癌症复发等结果。我们使用开源软件开发了一个基于 NLP 的系统,用于处理 1995 年至 2012 年间患有早期乳腺癌的女性的电子临床记录,以确定是否以及何时诊断出复发。我们使用西北太平洋地区一个综合医疗保健系统中接受 EHR 记录护理的 1472 名患者的临床记录来开发和评估该系统。一项单独的研究提供了患者级别的复发状态和日期的参考标准。基于 NLP 的系统正确识别了 92%的复发病例,并在 88%的情况下准确估计了 30 天内的诊断日期。特异性为 96%。基于 NLP 的系统忽略了 65 例复发中的 5 例,其中 4 例是因为电子文档不可用。基于 NLP 的系统错误地将 5 例其他复发病例归类为非复发病例,而这些病例在参考标准中被归类为复发病例。如果在类似的队列中使用,NLP 可以将需要手动提取以确认乳腺癌复发病例的 EHR 图表数量减少 90%,并且其速度可与传统提取方法相媲美。

相似文献

2
Using natural language processing and machine learning to identify breast cancer local recurrence.
BMC Bioinformatics. 2018 Dec 28;19(Suppl 17):498. doi: 10.1186/s12859-018-2466-x.
4
Using natural language processing to identify problem usage of prescription opioids.
Int J Med Inform. 2015 Dec;84(12):1057-64. doi: 10.1016/j.ijmedinf.2015.09.002. Epub 2015 Sep 25.
5
Challenges of Developing a Natural Language Processing Method With Electronic Health Records to Identify Persons With Chronic Mobility Disability.
Arch Phys Med Rehabil. 2020 Oct;101(10):1739-1746. doi: 10.1016/j.apmr.2020.04.024. Epub 2020 May 21.
10
Natural Language Processing to Identify Advance Care Planning Documentation in a Multisite Pragmatic Clinical Trial.
J Pain Symptom Manage. 2022 Jan;63(1):e29-e36. doi: 10.1016/j.jpainsymman.2021.06.025. Epub 2021 Jul 14.

引用本文的文献

3
From manual clinical criteria to machine learning algorithms: Comparing outcome endpoints derived from diverse electronic health record data modalities.
PLOS Digit Health. 2025 May 14;4(5):e0000755. doi: 10.1371/journal.pdig.0000755. eCollection 2025 May.
4
Automated Identification of Breast Cancer Relapse in Computed Tomography Reports Using Natural Language Processing.
JCO Clin Cancer Inform. 2024 Dec;8:e2400107. doi: 10.1200/CCI.24.00107. Epub 2024 Dec 20.
5
Artificial intelligence methods available for cancer research.
Front Med. 2024 Oct;18(5):778-797. doi: 10.1007/s11684-024-1085-3. Epub 2024 Aug 8.
8
Toward Efficient, Sustainable, and Scalable Methods of Treatment Characterization: An Investigation of Coding Clinical Practice from Chart Notes.
Adm Policy Ment Health. 2024 Jan;51(1):103-122. doi: 10.1007/s10488-023-01316-4. Epub 2023 Nov 30.
10
Identification of Child Survivors of Sex Trafficking From Electronic Health Records: An Artificial Intelligence Guided Approach.
Child Maltreat. 2024 Nov;29(4):601-611. doi: 10.1177/10775595231194599. Epub 2023 Aug 6.

本文引用的文献

1
Frequent antibiotic use and second breast cancer events.
Cancer Epidemiol Biomarkers Prev. 2013 Sep;22(9):1588-99. doi: 10.1158/1055-9965.EPI-13-0454. Epub 2013 Jul 5.
3
Identifying primary and recurrent cancers using a SAS-based natural language processing algorithm.
J Am Med Inform Assoc. 2013 Mar-Apr;20(2):349-55. doi: 10.1136/amiajnl-2012-000928. Epub 2012 Jul 21.
4
Administrative data algorithms to identify second breast cancer events following early-stage invasive breast cancer.
J Natl Cancer Inst. 2012 Jun 20;104(12):931-40. doi: 10.1093/jnci/djs233. Epub 2012 Apr 30.
5
Use of administrative data to estimate the incidence of statin-related rhabdomyolysis.
JAMA. 2012 Apr 18;307(15):1580-2. doi: 10.1001/jama.2012.489.
6
Importance of multi-modal approaches to effectively identify cataract cases from electronic health records.
J Am Med Inform Assoc. 2012 Mar-Apr;19(2):225-34. doi: 10.1136/amiajnl-2011-000456.
7
Tradeoffs between accuracy measures for electronic health care data algorithms.
J Clin Epidemiol. 2012 Mar;65(3):343-349.e2. doi: 10.1016/j.jclinepi.2011.09.002. Epub 2011 Dec 23.
8
Automated discovery of drug treatment patterns for endocrine therapy of breast cancer within an electronic medical record.
J Am Med Inform Assoc. 2012 Jun;19(e1):e83-9. doi: 10.1136/amiajnl-2011-000295. Epub 2011 Dec 1.
9
The promise of electronic records: around the corner or down the road?
JAMA. 2011 Aug 24;306(8):880-1. doi: 10.1001/jama.2011.1219.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验