Suppr超能文献

自然语言处理从肿瘤医生的病历中确定癌症结果。

Natural Language Processing to Ascertain Cancer Outcomes From Medical Oncologist Notes.

机构信息

Dana-Farber Cancer Institute, Boston, MA.

Harvard Medical School, Boston, MA.

出版信息

JCO Clin Cancer Inform. 2020 Aug;4:680-690. doi: 10.1200/CCI.20.00020.

Abstract

PURPOSE

Cancer research using electronic health records and genomic data sets requires clinical outcomes data, which may be recorded only in unstructured text by treating oncologists. Natural language processing (NLP) could substantially accelerate extraction of this information.

METHODS

Patients with lung cancer who had tumor sequencing as part of a single-institution precision oncology study from 2013 to 2018 were identified. Medical oncologists' progress notes for these patients were reviewed. For each note, curators recorded whether the assessment/plan indicated any cancer, progression/worsening of disease, and/or response to therapy or improving disease. Next, a recurrent neural network was trained using unlabeled notes to extract the assessment/plan from each note. Finally, convolutional neural networks were trained on labeled assessments/plans to predict the probability that each curated outcome was present. Model performance was evaluated using the area under the receiver operating characteristic curve (AUROC) among a held-out test set of 10% of patients. Associations between curated response or progression end points and overall survival were measured using Cox models among patients receiving palliative-intent systemic therapy.

RESULTS

Medical oncologist notes (n = 7,597) were manually curated for 919 patients. In the 10% test set, NLP models replicated human curation with AUROCs of 0.94 for the any-cancer outcome, 0.86 for the progression outcome, and 0.90 for the response outcome. Progression/worsening events identified using NLP models were associated with shortened survival (hazard ratio [HR] for mortality, 2.49; 95% CI, 2.00 to 3.09); response/improvement events were associated with improved survival (HR, 0.45; 95% CI, 0.30 to 0.67).

CONCLUSION

NLP models based on neural networks can extract meaningful outcomes from oncologist notes at scale. Such models may facilitate identification of clinical and genomic features associated with response to cancer treatment.

摘要

目的

使用电子健康记录和基因组数据集进行癌症研究需要临床结果数据,这些数据可能仅由治疗肿瘤学家以非结构化文本的形式记录。自然语言处理(NLP)可以大大加快提取这些信息的速度。

方法

确定了 2013 年至 2018 年间在一个机构进行精准肿瘤学研究中接受肿瘤测序的肺癌患者。审查了这些患者的肿瘤内科医生的进展记录。对于每一份记录,管理员都记录了评估/计划是否表明存在任何癌症、疾病进展/恶化以及/或对治疗的反应或改善疾病。接下来,使用未标记的记录训练一个递归神经网络,从每个记录中提取评估/计划。最后,在标记的评估/计划上训练卷积神经网络,以预测每个已审核结果存在的概率。使用 10%患者的测试集的接收者操作特征曲线(AUROC)评估模型性能。在接受姑息性系统治疗的患者中,使用 Cox 模型测量已审核的反应或进展终点与总生存期之间的关联。

结果

为 919 名患者手动审核了肿瘤内科医生的记录(n=7597)。在 10%的测试集中,NLP 模型的 AUROC 为 0.94,用于任何癌症结果;0.86,用于进展结果;0.90,用于反应结果,复制了人类的审核结果。使用 NLP 模型识别的进展/恶化事件与缩短的生存期相关(死亡风险比 [HR],2.49;95%置信区间 [CI],2.00 至 3.09);反应/改善事件与生存期改善相关(HR,0.45;95%CI,0.30 至 0.67)。

结论

基于神经网络的 NLP 模型可以大规模地从肿瘤内科医生的记录中提取有意义的结果。这种模型可能有助于识别与癌症治疗反应相关的临床和基因组特征。

相似文献

引用本文的文献

1
Multimodal integration strategies for clinical application in oncology.肿瘤学临床应用中的多模态整合策略
Front Pharmacol. 2025 Aug 20;16:1609079. doi: 10.3389/fphar.2025.1609079. eCollection 2025.

本文引用的文献

3
The Evolving Uses of "Real-World" Data.“真实世界”数据的用途演变
JAMA. 2019 Apr 9;321(14):1359-1360. doi: 10.1001/jama.2019.4064.
6
Opening the black box of machine learning.打开机器学习的黑箱。
Lancet Respir Med. 2018 Nov;6(11):801. doi: 10.1016/S2213-2600(18)30425-9. Epub 2018 Oct 18.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验