自然语言处理从肿瘤医生的病历中确定癌症结果。

Natural Language Processing to Ascertain Cancer Outcomes From Medical Oncologist Notes.

机构信息

Dana-Farber Cancer Institute, Boston, MA.

Harvard Medical School, Boston, MA.

出版信息

JCO Clin Cancer Inform. 2020 Aug;4:680-690. doi: 10.1200/CCI.20.00020.

DOI:10.1200/CCI.20.00020

PMID:32755459

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7469582/

Abstract

PURPOSE

Cancer research using electronic health records and genomic data sets requires clinical outcomes data, which may be recorded only in unstructured text by treating oncologists. Natural language processing (NLP) could substantially accelerate extraction of this information.

METHODS

Patients with lung cancer who had tumor sequencing as part of a single-institution precision oncology study from 2013 to 2018 were identified. Medical oncologists' progress notes for these patients were reviewed. For each note, curators recorded whether the assessment/plan indicated any cancer, progression/worsening of disease, and/or response to therapy or improving disease. Next, a recurrent neural network was trained using unlabeled notes to extract the assessment/plan from each note. Finally, convolutional neural networks were trained on labeled assessments/plans to predict the probability that each curated outcome was present. Model performance was evaluated using the area under the receiver operating characteristic curve (AUROC) among a held-out test set of 10% of patients. Associations between curated response or progression end points and overall survival were measured using Cox models among patients receiving palliative-intent systemic therapy.

RESULTS

Medical oncologist notes (n = 7,597) were manually curated for 919 patients. In the 10% test set, NLP models replicated human curation with AUROCs of 0.94 for the any-cancer outcome, 0.86 for the progression outcome, and 0.90 for the response outcome. Progression/worsening events identified using NLP models were associated with shortened survival (hazard ratio [HR] for mortality, 2.49; 95% CI, 2.00 to 3.09); response/improvement events were associated with improved survival (HR, 0.45; 95% CI, 0.30 to 0.67).

CONCLUSION

NLP models based on neural networks can extract meaningful outcomes from oncologist notes at scale. Such models may facilitate identification of clinical and genomic features associated with response to cancer treatment.

摘要

目的

使用电子健康记录和基因组数据集进行癌症研究需要临床结果数据，这些数据可能仅由治疗肿瘤学家以非结构化文本的形式记录。自然语言处理（NLP）可以大大加快提取这些信息的速度。

方法

确定了 2013 年至 2018 年间在一个机构进行精准肿瘤学研究中接受肿瘤测序的肺癌患者。审查了这些患者的肿瘤内科医生的进展记录。对于每一份记录，管理员都记录了评估/计划是否表明存在任何癌症、疾病进展/恶化以及/或对治疗的反应或改善疾病。接下来，使用未标记的记录训练一个递归神经网络，从每个记录中提取评估/计划。最后，在标记的评估/计划上训练卷积神经网络，以预测每个已审核结果存在的概率。使用 10%患者的测试集的接收者操作特征曲线（AUROC）评估模型性能。在接受姑息性系统治疗的患者中，使用 Cox 模型测量已审核的反应或进展终点与总生存期之间的关联。

结果

为 919 名患者手动审核了肿瘤内科医生的记录（n=7597）。在 10%的测试集中，NLP 模型的 AUROC 为 0.94，用于任何癌症结果；0.86，用于进展结果；0.90，用于反应结果，复制了人类的审核结果。使用 NLP 模型识别的进展/恶化事件与缩短的生存期相关（死亡风险比 [HR]，2.49；95%置信区间 [CI]，2.00 至 3.09）；反应/改善事件与生存期改善相关（HR，0.45；95%CI，0.30 至 0.67）。

结论

基于神经网络的 NLP 模型可以大规模地从肿瘤内科医生的记录中提取有意义的结果。这种模型可能有助于识别与癌症治疗反应相关的临床和基因组特征。

相似文献

Natural Language Processing to Ascertain Cancer Outcomes From Medical Oncologist Notes.自然语言处理从肿瘤医生的病历中确定癌症结果。

JCO Clin Cancer Inform. 2020 Aug;4:680-690. doi: 10.1200/CCI.20.00020.

Extraction and Imputation of Eastern Cooperative Oncology Group Performance Status From Unstructured Oncology Notes Using Language Models.使用语言模型从非结构化肿瘤学记录中提取和插补东部肿瘤协作组表现状态。

JCO Clin Cancer Inform. 2024 May;8:e2300269. doi: 10.1200/CCI.23.00269.

Assessment of Deep Natural Language Processing in Ascertaining Oncologic Outcomes From Radiology Reports.评估深度自然语言处理在从放射学报告中确定肿瘤学结果方面的应用

JAMA Oncol. 2019 Oct 1;5(10):1421-1429. doi: 10.1001/jamaoncol.2019.1800.

Clinical Inflection Point Detection on the Basis of EHR Data to Identify Clinical Trial-Ready Patients With Cancer.基于电子健康记录数据的临床拐点检测，以识别具有癌症治疗潜力的临床试验患者。

JCO Clin Cancer Inform. 2021 Jun;5:622-630. doi: 10.1200/CCI.20.00184.

Artificial intelligence-aided clinical annotation of a large multi-cancer genomic dataset.人工智能辅助的大型多癌种基因组数据集的临床注释。

Nat Commun. 2021 Dec 15;12(1):7304. doi: 10.1038/s41467-021-27358-6.

Machine learning and natural language processing (NLP) approach to predict early progression to first-line treatment in real-world hormone receptor-positive (HR+)/HER2-negative advanced breast cancer patients.机器学习和自然语言处理（NLP）方法预测激素受体阳性（HR+）/HER2 阴性晚期乳腺癌患者一线治疗的早期进展。

Eur J Cancer. 2021 Feb;144:224-231. doi: 10.1016/j.ejca.2020.11.030. Epub 2020 Dec 26.

Classifying social determinants of health from unstructured electronic health records using deep learning-based natural language processing.利用基于深度学习的自然语言处理技术从非结构化电子健康记录中分类社会健康决定因素。

J Biomed Inform. 2022 Mar;127:103984. doi: 10.1016/j.jbi.2021.103984. Epub 2022 Jan 7.

Empirical evaluation of language modeling to ascertain cancer outcomes from clinical text reports.从临床文本报告中评估语言模型以确定癌症结果的实证研究。

BMC Bioinformatics. 2023 Sep 2;24(1):328. doi: 10.1186/s12859-023-05439-1.

Natural Language Processing Algorithm to Extract Multiple Myeloma Stage From Oncology Notes in the Veterans Affairs Healthcare System.自然语言处理算法从退伍军人事务医疗保健系统中的肿瘤学记录中提取多发性骨髓瘤分期。

JCO Clin Cancer Inform. 2024 Jul;8:e2300197. doi: 10.1200/CCI.23.00197.

Automated NLP Extraction of Clinical Rationale for Treatment Discontinuation in Breast Cancer.自动化自然语言处理提取乳腺癌治疗停药的临床理由。

JCO Clin Cancer Inform. 2021 May;5:550-560. doi: 10.1200/CCI.20.00139.

引用本文的文献

Multimodal integration strategies for clinical application in oncology.肿瘤学临床应用中的多模态整合策略

Front Pharmacol. 2025 Aug 20;16:1609079. doi: 10.3389/fphar.2025.1609079. eCollection 2025.

Development and validation of a transformer-based deep learning model for predicting distant metastasis in non-small cell lung cancer using FDG PET/CT images.基于Transformer的深度学习模型的开发与验证，用于使用FDG PET/CT图像预测非小细胞肺癌的远处转移

Clin Transl Oncol. 2025 Aug 8. doi: 10.1007/s12094-025-04014-9.

A Novel Ensemble Framework for Comprehensive Early-Stage Colorectal Cancer Diagnosis, Prognosis, and Treatment: Integration of Gastroenterology-Specific Transformer Language Models and Multiple Decision Trees.一种用于早期结直肠癌综合诊断、预后和治疗的新型集成框架：胃肠病学特定变压器语言模型与多个决策树的整合

J Clin Med. 2025 Jun 23;14(13):4467. doi: 10.3390/jcm14134467.

Empirical evaluation of artificial intelligence distillation techniques for ascertaining cancer outcomes from electronic health records.用于从电子健康记录中确定癌症预后的人工智能提炼技术的实证评估。

NPJ Digit Med. 2025 Jun 10;8(1):347. doi: 10.1038/s41746-025-01646-7.

Current AI technologies in cancer diagnostics and treatment.癌症诊断与治疗中的当前人工智能技术。

Mol Cancer. 2025 Jun 2;24(1):159. doi: 10.1186/s12943-025-02369-9.

From manual clinical criteria to machine learning algorithms: Comparing outcome endpoints derived from diverse electronic health record data modalities.从手动临床标准到机器学习算法：比较源自不同电子健康记录数据模式的结局终点。

PLOS Digit Health. 2025 May 14;4(5):e0000755. doi: 10.1371/journal.pdig.0000755. eCollection 2025 May.

Clinical Trial Notifications Triggered by Artificial Intelligence-Detected Cancer Progression: A Randomized Trial.由人工智能检测到的癌症进展触发的临床试验通知：一项随机试验。

JAMA Netw Open. 2025 Apr 1;8(4):e252013. doi: 10.1001/jamanetworkopen.2025.2013.

Decoding Recurrence in Early-Stage and Locoregionally Advanced Non-Small Cell Lung Cancer: Insights From Electronic Health Records and Natural Language Processing.解读早期及局部晚期非小细胞肺癌的复发情况：来自电子健康记录和自然语言处理的见解

JCO Clin Cancer Inform. 2025 Apr;9:e2400227. doi: 10.1200/CCI-24-00227. Epub 2025 Apr 18.

An Order-Sensitive Hierarchical Neural Model for Early Lung Cancer Detection Using Dutch Primary Care Notes and Structured Data.一种使用荷兰初级保健记录和结构化数据进行早期肺癌检测的顺序敏感分层神经模型。

Cancers (Basel). 2025 Mar 29;17(7):1151. doi: 10.3390/cancers17071151.

Patient outcomes in advanced ovarian cancer treated with an anti-FOLR1 antibody-drug conjugate.用抗叶酸受体1（FOLR1）抗体药物偶联物治疗晚期卵巢癌的患者预后

Gynecol Oncol. 2025 Apr;195:173-179. doi: 10.1016/j.ygyno.2025.03.023. Epub 2025 Mar 22.

本文引用的文献

Use of Natural Language Processing to Extract Clinical Cancer Phenotypes from Electronic Medical Records.利用自然语言处理从电子病历中提取临床癌症表型

Cancer Res. 2019 Nov 1;79(21):5463-5470. doi: 10.1158/0008-5472.CAN-19-0579. Epub 2019 Aug 8.

Assessment of Deep Natural Language Processing in Ascertaining Oncologic Outcomes From Radiology Reports.评估深度自然语言处理在从放射学报告中确定肿瘤学结果方面的应用

JAMA Oncol. 2019 Oct 1;5(10):1421-1429. doi: 10.1001/jamaoncol.2019.1800.

The Evolving Uses of "Real-World" Data.“真实世界”数据的用途演变

JAMA. 2019 Apr 9;321(14):1359-1360. doi: 10.1001/jama.2019.4064.

Use of Wearable, Mobile, and Sensor Technology in Cancer Clinical Trials.可穿戴、移动和传感器技术在癌症临床试验中的应用。

JCO Clin Cancer Inform. 2018 Dec;2:1-11. doi: 10.1200/CCI.17.00147.

Race, Poverty, and Initial Implementation of Precision Medicine for Lung Cancer.种族、贫困与肺癌精准医疗的初步实施

J Natl Cancer Inst. 2019 Apr 1;111(4):431-434. doi: 10.1093/jnci/djy202.

Opening the black box of machine learning.打开机器学习的黑箱。

Lancet Respir Med. 2018 Nov;6(11):801. doi: 10.1016/S2213-2600(18)30425-9. Epub 2018 Oct 18.

Development and Validation of a High-Quality Composite Real-World Mortality Endpoint.高质量综合真实世界死亡率终点的开发与验证

Health Serv Res. 2018 Dec;53(6):4460-4476. doi: 10.1111/1475-6773.12872. Epub 2018 May 14.

Hierarchical attention networks for information extraction from cancer pathology reports.用于从癌症病理报告中提取信息的分层注意力网络。

J Am Med Inform Assoc. 2018 Mar 1;25(3):321-330. doi: 10.1093/jamia/ocx131.

Integrating Natural Language Processing and Machine Learning Algorithms to Categorize Oncologic Response in Radiology Reports.将自然语言处理和机器学习算法集成到放射学报告中的肿瘤反应分类中。

J Digit Imaging. 2018 Apr;31(2):178-184. doi: 10.1007/s10278-017-0027-x.

Deep Learning for Automated Extraction of Primary Sites From Cancer Pathology Reports.深度学习在癌症病理报告中自动提取原发部位的应用

IEEE J Biomed Health Inform. 2018 Jan;22(1):244-251. doi: 10.1109/JBHI.2017.2700722. Epub 2017 May 3.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验