一种用于估计转移性乳腺癌患者真实世界无进展生存期的深度学习工作流程：使用去识别化电子健康记录的研究

A Deep Learning-Enabled Workflow to Estimate Real-World Progression-Free Survival in Patients With Metastatic Breast Cancer: Study Using Deidentified Electronic Health Records.

作者信息

Varma Gowtham, Yenukoti Rohit Kumar, Kumar M Praveen, Ashrit Bandlamudi Sai, Purushotham K, Subash C, Ravi Sunil Kumar, Kurien Verghese, Aman Avinash, Manoharan Mithun, Jaiswal Shashank, Anand Akash, Barve Rakesh, Thiagarajan Viswanathan, Lenehan Patrick, Soefje Scott A, Soundararajan Venky

机构信息

Department of Clinical Sciences, Nference, 4th Floor, Indiqube, Golf View Campus Tower-2, 22, 3rd Cross Rd, Murugeshpalya, S R Layout, Bangalore, 560017, India, 91 8728831787.

Department of Data Science and Engineering, Nference, Bangalore, India.

出版信息

JMIR Cancer. 2025 May 15;11:e64697. doi: 10.2196/64697.

DOI:10.2196/64697

PMID:40372953

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12097284/

Abstract

BACKGROUND

Progression-free survival (PFS) is a crucial endpoint in cancer drug research. Clinician-confirmed cancer progression, namely real-world PFS (rwPFS) in unstructured text (ie, clinical notes), serves as a reasonable surrogate for real-world indicators in ascertaining progression endpoints. Response evaluation criteria in solid tumors (RECIST) is traditionally used in clinical trials using serial imaging evaluations but is impractical when working with real-world data. Manual abstraction of clinical progression from unstructured notes remains the gold standard. However, this process is a resource-intensive, time-consuming process. Natural language processing (NLP), a subdomain of machine learning, has shown promise in accelerating the extraction of tumor progression from real-world data in recent years.

OBJECTIVES

We aim to configure a pretrained, general-purpose health care NLP framework to transform free-text clinical notes and radiology reports into structured progression events for studying rwPFS on metastatic breast cancer (mBC) cohorts.

METHODS

This study developed and validated a novel semiautomated workflow to estimate rwPFS in patients with mBC using deidentified electronic health record data from the Nference nSights platform. The developed workflow was validated in a cohort of 316 patients with hormone receptor-positive, human epidermal growth factor receptor-2 (HER-2) 2-negative mBC, who were started on palbociclib and letrozole combination therapy between January 2015 and December 2021. Ground-truth datasets were curated to evaluate the workflow's performance at both the sentence and patient levels. NLP-captured progression or a change in therapy line were considered outcome events, while death, loss to follow-up, and end of the study period were considered censoring events for rwPFS computation. Peak reduction and cumulative decline in Patient Health Questionnaire-8 (PHQ-8) scores were analyzed in the progressed and nonprogressed patient subgroups.

RESULTS

The configured clinical NLP engine achieved a sentence-level progression capture accuracy of 98.2%. At the patient level, initial progression was captured within ±30 days with 88% accuracy. The median rwPFS for the study cohort (N=316) was 20 (95% CI 18-25) months. In a validation subset (n=100), rwPFS determined by manual curation was 25 (95% CI 15-35) months, closely aligning with the computational workflow's 22 (95% CI 15-35) months. A subanalysis revealed rwPFS estimates of 30 (95% CI 24-39) months from radiology reports and 23 (95% CI 19-28) months from clinical notes, highlighting the importance of integrating multiple note sources. External validation also demonstrated high accuracy (92.5% sentence level; 90.2% patient level). Sensitivity analysis revealed stable rwPFS estimates across varying levels of missing source data and event definitions. Peak reduction in PHQ-8 scores during the study period highlighted significant associations between patient-reported outcomes and disease progression.

CONCLUSIONS

This workflow enables rapid and reliable determination of rwPFS in patients with mBC receiving combination therapy. Further validation across more diverse external datasets and other cancer types is needed to ensure broader applicability and generalizability.

摘要

背景

无进展生存期（PFS）是癌症药物研究中的关键终点。临床医生确认的癌症进展，即在非结构化文本（即临床记录）中的真实世界PFS（rwPFS），在确定进展终点时可作为真实世界指标的合理替代指标。实体瘤疗效评价标准（RECIST）传统上用于使用系列影像评估的临床试验，但在处理真实世界数据时不切实际。从非结构化记录中人工提取临床进展仍然是金标准。然而，这个过程资源密集、耗时。自然语言处理（NLP）作为机器学习的一个子领域，近年来在加速从真实世界数据中提取肿瘤进展方面显示出前景。

目的

我们旨在配置一个预训练的通用医疗保健NLP框架，将自由文本临床记录和放射学报告转化为结构化进展事件，以研究转移性乳腺癌（mBC）队列中的rwPFS。

方法

本研究开发并验证了一种新颖的半自动化工作流程，使用来自Nference nSights平台的去识别化电子健康记录数据估计mBC患者的rwPFS。所开发的工作流程在316例激素受体阳性、人表皮生长因子受体2（HER-2）阴性的mBC患者队列中进行了验证，这些患者在2015年1月至2021年12月期间开始接受哌柏西利和来曲唑联合治疗。精心整理了真实数据集，以在句子和患者层面评估工作流程的性能。NLP捕获的进展或治疗线的变化被视为结局事件，而死亡、失访和研究期结束被视为rwPFS计算的删失事件。在进展和未进展的患者亚组中分析了患者健康问卷-8（PHQ-8）评分的峰值降低和累积下降情况。

结果

配置的临床NLP引擎在句子层面的进展捕获准确率达到98.2%。在患者层面，初始进展在±30天内被捕获，准确率为88%。研究队列（N = 316）的中位rwPFS为20（95%CI 18 - 25）个月。在一个验证亚组（n = 100）中，通过人工整理确定的rwPFS为25（95%CI 15 - 3

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d255/12097284/19595206ccf3/cancer-v11-e64697-g001.jpg

相似文献

A Deep Learning-Enabled Workflow to Estimate Real-World Progression-Free Survival in Patients With Metastatic Breast Cancer: Study Using Deidentified Electronic Health Records.一种用于估计转移性乳腺癌患者真实世界无进展生存期的深度学习工作流程：使用去识别化电子健康记录的研究

JMIR Cancer. 2025 May 15;11:e64697. doi: 10.2196/64697.

Comparative effectiveness of first-line palbociclib plus letrozole versus letrozole alone for HR+/HER2- metastatic breast cancer in US real-world clinical practice.在真实世界的美国临床实践中，一线哌柏西利联合来曲唑对比来曲唑单药治疗 HR+/HER2-转移性乳腺癌的疗效比较。

Breast Cancer Res. 2021 Mar 24;23(1):37. doi: 10.1186/s13058-021-01409-8.

Machine learning and natural language processing (NLP) approach to predict early progression to first-line treatment in real-world hormone receptor-positive (HR+)/HER2-negative advanced breast cancer patients.机器学习和自然语言处理（NLP）方法预测激素受体阳性（HR+）/HER2 阴性晚期乳腺癌患者一线治疗的早期进展。

Eur J Cancer. 2021 Feb;144:224-231. doi: 10.1016/j.ejca.2020.11.030. Epub 2020 Dec 26.

Generating Real-World Tumor Burden Endpoints from Electronic Health Record Data: Comparison of RECIST, Radiology-Anchored, and Clinician-Anchored Approaches for Abstracting Real-World Progression in Non-Small Cell Lung Cancer.从电子健康记录数据中生成真实世界的肿瘤负担终点：RECIST、放射学锚定和临床医生锚定方法在非小细胞肺癌真实世界进展中的比较。

Adv Ther. 2019 Aug;36(8):2122-2136. doi: 10.1007/s12325-019-00970-1. Epub 2019 May 28.

Association between progression-free survival and overall survival in women receiving first-line treatment for metastatic breast cancer: evidence from the ESME real-world database.一线治疗转移性乳腺癌女性的无进展生存期和总生存期的关联：来自 ESME 真实世界数据库的证据。

BMC Med. 2023 Mar 8;21(1):87. doi: 10.1186/s12916-023-02754-5.

Real-World Effectiveness of Palbociclib Plus Letrozole vs Letrozole Alone for Metastatic Breast Cancer With Lung or Liver Metastases: Flatiron Database Analysis.哌柏西利联合来曲唑与单独使用来曲唑治疗伴有肺或肝转移的转移性乳腺癌的真实世界疗效：Flatiron数据库分析

Front Oncol. 2022 Jul 4;12:865292. doi: 10.3389/fonc.2022.865292. eCollection 2022.

Real-world progression-free survival and overall survival of palbociclib plus endocrine therapy (ET) in Japanese patients with hormone receptor-positive/human epidermal growth factor receptor 2-negative advanced breast cancer in the first-line or second-line setting: an observational study.帕博西尼联合内分泌治疗（ET）在一线或二线治疗激素受体阳性/人表皮生长因子受体 2 阴性晚期乳腺癌的日本患者中的真实世界无进展生存期和总生存期：一项观察性研究。

Breast Cancer. 2024 Jul;31(4):621-632. doi: 10.1007/s12282-024-01575-5. Epub 2024 Apr 20.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区，服用抗叶酸抗疟药物的人群中，叶酸补充剂与疟疾易感性和严重程度的关系。

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

Concordance of real-world versus conventional progression-free survival from a phase 3 trial of endocrine therapy as first-line treatment for metastatic breast cancer.从转移性乳腺癌一线内分泌治疗的 III 期临床试验来看，真实世界与传统无进展生存期的一致性。

PLoS One. 2020 Apr 21;15(4):e0227256. doi: 10.1371/journal.pone.0227256. eCollection 2020.

Analysis of a Real-World Progression Variable and Related Endpoints for Patients with Five Different Cancer Types.五种不同癌症类型患者的真实世界进展变量及相关终点分析。

Adv Ther. 2022 Jun;39(6):2831-2849. doi: 10.1007/s12325-022-02091-8. Epub 2022 Apr 17.

本文引用的文献

Patient-Reported Outcome Measures in Cancer Care: An Updated Systematic Review and Meta-Analysis.癌症护理中的患者报告结局测量：更新的系统评价和荟萃分析。

JAMA Netw Open. 2024 Aug 1;7(8):e2424793. doi: 10.1001/jamanetworkopen.2024.24793.

Overview of approaches to estimate real-world disease progression in lung cancer.肺癌真实世界疾病进展评估方法概述。

JNCI Cancer Spectr. 2023 Oct 31;7(6). doi: 10.1093/jncics/pkad074.

Inferring cancer disease response from radiology reports using large language models with data augmentation and prompting.利用数据增强和提示技术的大型语言模型从放射学报告推断癌症疾病反应。

J Am Med Inform Assoc. 2023 Sep 25;30(10):1657-1664. doi: 10.1093/jamia/ocad133.

Patient-Reported Outcomes as Interradiographic Predictors of Response in Non-Small Cell Lung Cancer.患者报告的结局作为非小细胞肺癌反应的影像学预测因子。

Clin Cancer Res. 2023 Aug 15;29(16):3142-3150. doi: 10.1158/1078-0432.CCR-23-0396.

Analysis of a Real-World Progression Variable and Related Endpoints for Patients with Five Different Cancer Types.五种不同癌症类型患者的真实世界进展变量及相关终点分析。

Adv Ther. 2022 Jun;39(6):2831-2849. doi: 10.1007/s12325-022-02091-8. Epub 2022 Apr 17.

Building a best-in-class automated de-identification tool for electronic health records through ensemble learning.通过集成学习构建用于电子健康记录的一流自动去识别工具。

Patterns (N Y). 2021 May 12;2(6):100255. doi: 10.1016/j.patter.2021.100255. eCollection 2021 Jun 11.

Patient-Reported Outcomes Predict Progression-Free Survival of Patients with Advanced Breast Cancer Treated with Abemaciclib.患者报告结局可预测接受阿贝西利治疗的晚期乳腺癌患者的无进展生存期。

Oncologist. 2021 Jul;26(7):562-568. doi: 10.1002/onco.13806. Epub 2021 May 11.

Breast Cancer Res. 2021 Mar 24;23(1):37. doi: 10.1186/s13058-021-01409-8.

Use of Real-World Evidence to Support FDA Approval of Oncology Drugs.利用真实世界证据支持肿瘤药物的 FDA 批准

Value Health. 2020 Oct;23(10):1358-1365. doi: 10.1016/j.jval.2020.06.006. Epub 2020 Sep 14.

Augmented curation of clinical notes from a massive EHR system reveals symptoms of impending COVID-19 diagnosis.从庞大的电子健康记录系统中增强对临床记录的注释可揭示即将出现 COVID-19 诊断的症状。

Elife. 2020 Jul 7;9:e58227. doi: 10.7554/eLife.58227.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

一种用于估计转移性乳腺癌患者真实世界无进展生存期的深度学习工作流程：使用去识别化电子健康记录的研究

A Deep Learning-Enabled Workflow to Estimate Real-World Progression-Free Survival in Patients With Metastatic Breast Cancer: Study Using Deidentified Electronic Health Records.

作者信息

机构信息

出版信息

BACKGROUND

OBJECTIVES

METHODS

RESULTS

CONCLUSIONS

背景

目的

方法

结果

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献