为前列腺癌临床护理与研究自动采集结构化病理数据

Automating the Capture of Structured Pathology Data for Prostate Cancer Clinical Care and Research.

作者信息

Odisho Anobel Y, Bridge Mark, Webb Mitchell, Ameli Niloufar, Eapen Renu S, Stauf Frank, Cowan Janet E, Washington Samuel L, Herlemann Annika, Carroll Peter R, Cooperberg Matthew R

机构信息

University of California, San Francisco, San Francisco, CA.

University of California, San Francisco Medical Center, San Francisco, CA.

出版信息

JCO Clin Cancer Inform. 2019 Jul;3:1-8. doi: 10.1200/CCI.18.00084.

DOI:10.1200/CCI.18.00084

PMID:31314550

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6874052/

Abstract

PURPOSE

Cancer pathology findings are critical for many aspects of care but are often locked away as unstructured free text. Our objective was to develop a natural language processing (NLP) system to extract prostate pathology details from postoperative pathology reports and a parallel structured data entry process for use by urologists during routine documentation care and compare accuracy when compared with manual abstraction and concordance between NLP and clinician-entered approaches.

MATERIALS AND METHODS

From February 2016, clinicians used note templates with custom structured data elements (SDEs) during routine clinical care for men with prostate cancer. We also developed an NLP algorithm to parse radical prostatectomy pathology reports and extract structured data. We compared accuracy of clinician-entered SDEs and NLP-parsed data to manual abstraction as a gold standard and compared concordance (Cohen's κ) between approaches assuming no gold standard.

RESULTS

There were 523 patients with NLP-extracted data, 319 with SDE data, and 555 with manually abstracted data. For Gleason scores, NLP and clinician SDE accuracy was 95.6% and 95.8%, respectively, compared with manual abstraction, with concordance of 0.93 (95% CI, 0.89 to 0.98). For margin status, extracapsular extension, and seminal vesicle invasion, stage, and lymph node status, NLP accuracy was 94.8% to 100%, SDE accuracy was 87.7% to 100%, and concordance between NLP and SDE ranged from 0.92 to 1.0.

CONCLUSION

We show that a real-world deployment of an NLP algorithm to extract pathology data and structured data entry by clinicians during routine clinical care in a busy clinical practice can generate accurate data when compared with manual abstraction for some, but not all, components of a prostate pathology report.

摘要

目的

癌症病理检查结果在许多护理环节中至关重要，但往往以非结构化的自由文本形式封存。我们的目标是开发一种自然语言处理（NLP）系统，从术后病理报告中提取前列腺病理细节，并开发一个并行的结构化数据录入流程，供泌尿外科医生在常规文档护理中使用，并将其准确性与手动提取进行比较，以及比较NLP和临床医生录入方法之间的一致性。

材料与方法

从2016年2月起，临床医生在对前列腺癌男性患者进行常规临床护理时，使用带有自定义结构化数据元素（SDE）的笔记模板。我们还开发了一种NLP算法，用于解析根治性前列腺切除术病理报告并提取结构化数据。我们将临床医生录入的SDE数据和NLP解析的数据的准确性与作为金标准的手动提取进行比较，并在没有金标准的情况下比较两种方法之间的一致性（Cohen's κ）。

结果

有523例患者有NLP提取的数据，319例有SDE数据，555例有手动提取的数据。对于 Gleason评分，与手动提取相比，NLP和临床医生SDE的准确性分别为95.6%和95.8%，一致性为0.93（95%CI，0.89至0.98）。对于切缘状态、包膜外侵犯、精囊侵犯、分期和淋巴结状态，NLP的准确性为94.8%至百分之百，SDE的准确性为87.7%至百分之百，NLP和SDE之间的一致性范围为0.92至1.0。

结论

我们表明，在繁忙的临床实践中，在常规临床护理期间，实际部署NLP算法以提取病理数据和临床医生进行结构化数据录入，与手动提取前列腺病理报告的某些但并非所有组成部分相比，可以生成准确的数据。

相似文献

Automating the Capture of Structured Pathology Data for Prostate Cancer Clinical Care and Research.为前列腺癌临床护理与研究自动采集结构化病理数据

JCO Clin Cancer Inform. 2019 Jul;3:1-8. doi: 10.1200/CCI.18.00084.

A natural language processing program effectively extracts key pathologic findings from radical prostatectomy reports.一个自然语言处理程序能有效地从根治性前列腺切除术报告中提取关键病理结果。

J Endourol. 2014 Dec;28(12):1474-8. doi: 10.1089/end.2014.0221.

Automating Access to Real-World Evidence.实现真实世界证据获取的自动化。

JTO Clin Res Rep. 2022 May 17;3(6):100340. doi: 10.1016/j.jtocrr.2022.100340. eCollection 2022 Jun.

Automated Extraction of Grade, Stage, and Quality Information From Transurethral Resection of Bladder Tumor Pathology Reports Using Natural Language Processing.使用自然语言处理技术从膀胱肿瘤经尿道切除术病理报告中自动提取分级、分期和质量信息

JCO Clin Cancer Inform. 2018 Dec;2:1-8. doi: 10.1200/CCI.17.00128.

Data for registry and quality review can be retrospectively collected using natural language processing from unstructured charts of arthroplasty patients.可以使用自然语言处理从关节置换患者的非结构化图表中回顾性地收集注册和质量审查数据。

Bone Joint J. 2020 Jul;102-B(7_Supple_B):99-104. doi: 10.1302/0301-620X.102B7.BJJ-2019-1574.R1.

Automating the Determination of Prostate Cancer Risk Strata From Electronic Medical Records.通过电子病历自动确定前列腺癌风险分层

JCO Clin Cancer Inform. 2017;1. doi: 10.1200/CCI.16.00045. Epub 2017 Jun 8.

Extracting data from electronic medical records: validation of a natural language processing program to assess prostate biopsy results.从电子病历中提取数据：评估前列腺活检结果的自然语言处理程序的验证

World J Urol. 2014 Feb;32(1):99-103. doi: 10.1007/s00345-013-1040-4. Epub 2013 Feb 17.

Validation of a Zero-shot Learning Natural Language Processing Tool to Facilitate Data Abstraction for Urologic Research.用于促进泌尿外科研究数据提取的零样本学习自然语言处理工具的验证

Eur Urol Focus. 2024 Mar;10(2):279-287. doi: 10.1016/j.euf.2024.01.009. Epub 2024 Jan 25.

Using Natural Language Processing to Automatically Identify Dysplasia in Pathology Reports for Patients With Barrett's Esophagus.利用自然语言处理技术自动识别 Barrett 食管患者的病理学报告中的异型增生。

Clin Gastroenterol Hepatol. 2023 May;21(5):1198-1204. doi: 10.1016/j.cgh.2022.09.005. Epub 2022 Sep 15.

Obtaining Knowledge in Pathology Reports Through a Natural Language Processing Approach With Classification, Named-Entity Recognition, and Relation-Extraction Heuristics.通过采用分类、命名实体识别和关系提取启发式方法的自然语言处理途径从病理报告中获取知识。

JCO Clin Cancer Inform. 2019 Aug;3:1-8. doi: 10.1200/CCI.19.00008.

引用本文的文献

Automated Extraction of Imaging and Pathology Data From Diverse Prostate Cancer Electronic Records.从多种前列腺癌电子记录中自动提取影像和病理数据

JCO Clin Cancer Inform. 2025 Aug;9:e2500085. doi: 10.1200/CCI-25-00085. Epub 2025 Aug 7.

Clinical applications of large language models in medicine and surgery: A scoping review.大型语言模型在医学与外科中的临床应用：一项范围综述

J Int Med Res. 2025 Jul;53(7):3000605251347556. doi: 10.1177/03000605251347556. Epub 2025 Jul 4.

Structuring and centralizing breast cancer real-world biomarker data from pathology reports through C-LAB artificial intelligence platform.通过C-LAB人工智能平台构建并集中来自病理报告的乳腺癌真实世界生物标志物数据。

Digit Health. 2025 Feb 25;11:20552076251323110. doi: 10.1177/20552076251323110. eCollection 2025 Jan-Dec.

The importance of studying the implementation of cancer data standards.研究癌症数据标准实施情况的重要性。

Cancer. 2025 Jan 1;131(1):e35441. doi: 10.1002/cncr.35441. Epub 2024 Jun 14.

Extracting Electronic Health Record Neuroblastoma Treatment Data With High Fidelity Using the REDCap Clinical Data Interoperability Services Module.使用 REDCap 临床数据互操作性服务模块，以高保真度提取电子健康记录神经母细胞瘤治疗数据。

JCO Clin Cancer Inform. 2024 May;8:e2400009. doi: 10.1200/CCI.24.00009.

Applications of the Natural Language Processing Tool ChatGPT in Clinical Practice: Comparative Study and Augmented Systematic Review.自然语言处理工具ChatGPT在临床实践中的应用：比较研究与增强型系统评价

JMIR Med Inform. 2023 Nov 28;11:e48933. doi: 10.2196/48933.

Natural language processing in urology: Automated extraction of clinical information from histopathology reports of uro-oncology procedures.泌尿外科中的自然语言处理：从泌尿肿瘤手术组织病理学报告中自动提取临床信息

Heliyon. 2023 Mar 24;9(4):e14793. doi: 10.1016/j.heliyon.2023.e14793. eCollection 2023 Apr.

An accessible, efficient, and accurate natural language processing method for extracting diagnostic data from pathology reports.一种用于从病理报告中提取诊断数据的便捷、高效且准确的自然语言处理方法。

J Pathol Inform. 2022 Nov 8;13:100154. doi: 10.1016/j.jpi.2022.100154. eCollection 2022.

Assessment of Electronic Health Record for Cancer Research and Patient Care Through a Scoping Review of Cancer Natural Language Processing.通过癌症自然语言处理的范围综述评估癌症研究和患者护理的电子健康记录。

JCO Clin Cancer Inform. 2022 Jul;6:e2200006. doi: 10.1200/CCI.22.00006.

Expanding the Secondary Use of Prostate Cancer Real World Data: Automated Classifiers for Clinical and Pathological Stage.拓展前列腺癌真实世界数据的二次利用：临床和病理分期的自动分类器

Front Digit Health. 2022 Jun 2;4:793316. doi: 10.3389/fdgth.2022.793316. eCollection 2022.

本文引用的文献

The AUA Quality Registry: Engaging Stakeholders to Improve the Quality of Care for Patients with Prostate Cancer.美国泌尿外科学会质量登记处：吸引利益相关者提高前列腺癌患者的护理质量。

Urol Pract. 2017 Jan;4(1):30-35. doi: 10.1016/j.urpr.2016.03.009. Epub 2016 Sep 28.

JCO Clin Cancer Inform. 2018 Dec;2:1-8. doi: 10.1200/CCI.17.00128.

Development of a Radiation Oncology-Specific Prospective Data Registry for Research and Quality Improvement: A Clinical Workflow-Based Solution.开发用于研究和质量改进的放射肿瘤学专用前瞻性数据登记库：基于临床工作流程的解决方案。

JCO Clin Cancer Inform. 2018 Dec;2:1-9. doi: 10.1200/CCI.17.00036.

Comparison of Natural Language Processing and Manual Coding for the Identification of Cross-Sectional Imaging Reports Suspicious for Lung Cancer.用于识别可疑肺癌横断面影像报告的自然语言处理与人工编码的比较

JCO Clin Cancer Inform. 2018 Dec;2:1-7. doi: 10.1200/CCI.17.00069.

Cancer statistics, 2019.癌症统计数据，2019 年。

CA Cancer J Clin. 2019 Jan;69(1):7-34. doi: 10.3322/caac.21551. Epub 2019 Jan 8.

Automating the Determination of Prostate Cancer Risk Strata From Electronic Medical Records.通过电子病历自动确定前列腺癌风险分层

JCO Clin Cancer Inform. 2017;1. doi: 10.1200/CCI.16.00045. Epub 2017 Jun 8.

Rapid Development of Specialty Population Registries and Quality Measures from Electronic Health Record Data*. An Agile Framework.利用电子健康记录数据快速开发专科人群登记册和质量指标*。一个敏捷框架。

Methods Inf Med. 2017 Jun 14;56(99):e74-e83. doi: 10.3414/ME16-02-0031.

Development of a Natural Language Processing Engine to Generate Bladder Cancer Pathology Data for Health Services Research.开发用于健康服务研究的自然语言处理引擎以生成膀胱癌病理数据。

Urology. 2017 Dec;110:84-91. doi: 10.1016/j.urology.2017.07.056. Epub 2017 Sep 12.

Assessment of Automating Safety Surveillance From Electronic Health Records: Analysis for the Quality and Safety Review System.从电子健康记录中自动化安全监测的评估：质量和安全审查系统分析。

J Patient Saf. 2021 Sep 1;17(6):e524-e528. doi: 10.1097/PTS.0000000000000402.

Natural language processing in pathology: a scoping review.病理学中的自然语言处理：一项范围综述。

J Clin Pathol. 2016 Jul 22. doi: 10.1136/jclinpath-2016-203872.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验