用于评估前列腺癌治疗后以患者为中心的结果的弱监督自然语言处理

Weakly supervised natural language processing for assessing patient-centered outcome following prostate cancer treatment.

作者信息

Banerjee Imon, Li Kevin, Seneviratne Martin, Ferrari Michelle, Seto Tina, Brooks James D, Rubin Daniel L, Hernandez-Boussard Tina

机构信息

Department of Biomedical Data Science, Stanford University School of Medicine, Medical School Office Building (MSOB), 1265 Welch Road, Stanford, California 94305-5479, USA.

Stanford University School of Medicine, 291 Campus Drive, Stanford, California 94305-5479, USA.

出版信息

JAMIA Open. 2019 Apr;2(1):150-159. doi: 10.1093/jamiaopen/ooy057. Epub 2019 Jan 4.

DOI:10.1093/jamiaopen/ooy057

PMID:31032481

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6482003/

Abstract

BACKGROUND

The population-based assessment of patient-centered outcomes (PCOs) has been limited by the efficient and accurate collection of these data. Natural language processing (NLP) pipelines can determine whether a clinical note within an electronic medical record contains evidence on these data. We present and demonstrate the accuracy of an NLP pipeline that targets to assess the presence, absence, or risk discussion of two important PCOs following prostate cancer treatment: urinary incontinence (UI) and bowel dysfunction (BD).

METHODS

We propose a weakly supervised NLP approach which annotates electronic medical record clinical notes without requiring manual chart review. A weighted function of neural word embedding was used to create a sentence-level vector representation of relevant expressions extracted from the clinical notes. Sentence vectors were used as input for a multinomial logistic model, with output being either presence, absence or risk discussion of UI/BD. The classifier was trained based on automated sentence annotation depending only on domain-specific dictionaries (weak supervision).

RESULTS

The model achieved an average F1 score of 0.86 for the sentence-level, three-tier classification task (presence/absence/risk) in both UI and BD. The model also outperformed a pre-existing rule-based model for note-level annotation of UI with significant margin.

CONCLUSIONS

We demonstrate a machine learning method to categorize clinical notes based on important PCOs that trains a classifier on sentence vector representations labeled with a domain-specific dictionary, which eliminates the need for manual engineering of linguistic rules or manual chart review for extracting the PCOs. The weakly supervised NLP pipeline showed promising sensitivity and specificity for identifying important PCOs in unstructured clinical text notes compared to rule-based algorithms.

摘要

背景

以患者为中心的结局（PCOs）的基于人群的评估一直受到这些数据高效准确收集的限制。自然语言处理（NLP）管道可以确定电子病历中的临床记录是否包含这些数据的证据。我们展示并论证了一种NLP管道的准确性，该管道旨在评估前列腺癌治疗后两个重要PCOs的存在、不存在或风险讨论：尿失禁（UI）和肠道功能障碍（BD）。

方法

我们提出一种弱监督NLP方法，该方法无需人工查阅病历即可对电子病历临床记录进行注释。使用神经词嵌入的加权函数来创建从临床记录中提取的相关表达式的句子级向量表示。句子向量用作多项逻辑模型的输入，输出为UI/BD的存在、不存在或风险讨论。该分类器仅基于特定领域词典进行自动句子注释进行训练（弱监督）。

结果

该模型在UI和BD的句子级三层分类任务（存在/不存在/风险）中平均F1分数达到0.86。该模型在UI的笔记级注释方面也显著优于现有的基于规则的模型。

结论

我们展示了一种基于重要PCOs对临床记录进行分类的机器学习方法，该方法在由特定领域词典标记的句子向量表示上训练分类器，从而无需人工构建语言规则或人工查阅病历以提取PCOs。与基于规则的算法相比，弱监督NLP管道在识别非结构化临床文本记录中的重要PCOs方面显示出有前景的敏感性和特异性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9d35/6951970/c4a8cd141a2d/ooy057f1.jpg

相似文献

Weakly supervised natural language processing for assessing patient-centered outcome following prostate cancer treatment.

JAMIA Open. 2019 Apr;2(1):150-159. doi: 10.1093/jamiaopen/ooy057. Epub 2019 Jan 4.

Phenotyping severity of patient-centered outcomes using clinical notes: A prostate cancer use case.

Learn Health Syst. 2020 Jul 17;4(4):e10237. doi: 10.1002/lrh2.10237. eCollection 2020 Oct.

Natural language processing pipeline to extract prostate cancer-related information from clinical notes.

Eur Radiol. 2024 Dec;34(12):7878-7891. doi: 10.1007/s00330-024-10812-6. Epub 2024 Jun 6.

Ontology-driven and weakly supervised rare disease identification from clinical notes.

BMC Med Inform Decis Mak. 2023 May 5;23(1):86. doi: 10.1186/s12911-023-02181-9.

Artificial Intelligence Learning Semantics via External Resources for Classifying Diagnosis Codes in Discharge Notes.

J Med Internet Res. 2017 Nov 6;19(11):e380. doi: 10.2196/jmir.8344.

Medical subdomain classification of clinical notes using a machine learning-based natural language processing approach.

BMC Med Inform Decis Mak. 2017 Dec 1;17(1):155. doi: 10.1186/s12911-017-0556-8.

Automated Identification of Patients With Immune-Related Adverse Events From Clinical Notes Using Word Embedding and Machine Learning.

JCO Clin Cancer Inform. 2021 May;5:541-549. doi: 10.1200/CCI.20.00109.

Risk prediction using natural language processing of electronic mental health records in an inpatient forensic psychiatry setting.

J Biomed Inform. 2018 Oct;86:49-58. doi: 10.1016/j.jbi.2018.08.007. Epub 2018 Aug 14.

J Biomed Inform. 2019 Feb;90:103103. doi: 10.1016/j.jbi.2019.103103. Epub 2019 Jan 9.

Using weak supervision and deep learning to classify clinical notes for identification of current suicidal ideation.

J Psychiatr Res. 2021 Apr;136:95-102. doi: 10.1016/j.jpsychires.2021.01.052. Epub 2021 Feb 2.

引用本文的文献

Knowledge-Informed Machine Learning for Cancer Diagnosis and Prognosis: A Review.

IEEE Trans Autom Sci Eng. 2025;22:10008-10028. doi: 10.1109/tase.2024.3515839. Epub 2024 Dec 18.

Mitigating bias in prostate cancer diagnosis using synthetic data for improved AI driven Gleason grading.

NPJ Precis Oncol. 2025 May 23;9(1):151. doi: 10.1038/s41698-025-00934-5.

Automated labelling of radiology reports using natural language processing: Comparison of traditional and newer methods.

Health Care Sci. 2023 Apr 24;2(2):120-128. doi: 10.1002/hcs2.40. eCollection 2023 Apr.

Natural language processing pipeline to extract prostate cancer-related information from clinical notes.

Eur Radiol. 2024 Dec;34(12):7878-7891. doi: 10.1007/s00330-024-10812-6. Epub 2024 Jun 6.

Natural language processing systems for extracting information from electronic health records about activities of daily living. A systematic review.

JAMIA Open. 2024 May 24;7(2):ooae044. doi: 10.1093/jamiaopen/ooae044. eCollection 2024 Jul.

Using natural language processing to analyze unstructured patient-reported outcomes data derived from electronic health records for cancer populations: a systematic review.

Expert Rev Pharmacoecon Outcomes Res. 2024 Apr;24(4):467-475. doi: 10.1080/14737167.2024.2322664. Epub 2024 Mar 5.

Natural language processing with machine learning methods to analyze unstructured patient-reported outcomes derived from electronic health records: A systematic review.

Artif Intell Med. 2023 Dec;146:102701. doi: 10.1016/j.artmed.2023.102701. Epub 2023 Nov 1.

Weakly supervised spatial relation extraction from radiology reports.

JAMIA Open. 2023 Apr 22;6(2):ooad027. doi: 10.1093/jamiaopen/ooad027. eCollection 2023 Jul.

Developing a Data and Analytics Platform to Enable a Breast Cancer Learning Health System at a Regional Cancer Center.

JCO Clin Cancer Inform. 2023 Mar;7:e2200182. doi: 10.1200/CCI.22.00182.

Machine learning approaches for electronic health records phenotyping: a methodical review.

J Am Med Inform Assoc. 2023 Jan 18;30(2):367-381. doi: 10.1093/jamia/ocac216.

本文引用的文献

Architecture and Implementation of a Clinical Research Data Warehouse for Prostate Cancer.

EGEMS (Wash DC). 2018 Jun 1;6(1):13. doi: 10.5334/egems.234.

Enhanced Quality Measurement Event Detection: An Application to Physician Reporting.

EGEMS (Wash DC). 2017 May 30;5(1):5. doi: 10.13063/2327-9214.1270.

Mining Electronic Health Records to Extract Patient-Centered Outcomes Following Prostate Cancer Treatment.

AMIA Annu Symp Proc. 2018 Apr 16;2017:876-882. eCollection 2017.

Automatic information extraction from unstructured mammography reports using distributed semantics.

J Biomed Inform. 2018 Feb;78:78-86. doi: 10.1016/j.jbi.2017.12.016. Epub 2018 Jan 9.

Radiology report annotation using intelligent word embeddings: Applied to multi-institutional chest CT cohort.

J Biomed Inform. 2018 Jan;77:11-20. doi: 10.1016/j.jbi.2017.11.012. Epub 2017 Nov 23.

Natural language processing systems for capturing and standardizing unstructured clinical information: A systematic review.

J Biomed Inform. 2017 Sep;73:14-29. doi: 10.1016/j.jbi.2017.07.012. Epub 2017 Jul 17.

Association Between Choice of Radical Prostatectomy, External Beam Radiotherapy, Brachytherapy, or Active Surveillance and Patient-Reported Quality of Life Among Men With Localized Prostate Cancer.

JAMA. 2017 Mar 21;317(11):1141-1150. doi: 10.1001/jama.2017.1652.

Cancer Statistics, 2017.

CA Cancer J Clin. 2017 Jan;67(1):7-30. doi: 10.3322/caac.21387. Epub 2017 Jan 5.

Patient-Reported Outcomes after Monitoring, Surgery, or Radiotherapy for Prostate Cancer.

N Engl J Med. 2016 Oct 13;375(15):1425-1437. doi: 10.1056/NEJMoa1606221. Epub 2016 Sep 14.

10-Year Outcomes after Monitoring, Surgery, or Radiotherapy for Localized Prostate Cancer.

N Engl J Med. 2016 Oct 13;375(15):1415-1424. doi: 10.1056/NEJMoa1606220. Epub 2016 Sep 14.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

Suppr
超能文献

用于评估前列腺癌治疗后以患者为中心的结果的弱监督自然语言处理

Weakly supervised natural language processing for assessing patient-centered outcome following prostate cancer treatment.

作者信息

机构信息