Suppr超能文献

使用语言模型从非结构化肿瘤学记录中提取和插补东部肿瘤协作组表现状态。

Extraction and Imputation of Eastern Cooperative Oncology Group Performance Status From Unstructured Oncology Notes Using Language Models.

机构信息

Dana-Farber Cancer Institute, Boston, MA.

Harvard Medical School, Boston, MA.

出版信息

JCO Clin Cancer Inform. 2024 May;8:e2300269. doi: 10.1200/CCI.23.00269.

Abstract

PURPOSE

Eastern Cooperative Oncology Group (ECOG) performance status (PS) is a key clinical variable for cancer treatment and research, but it is usually only recorded in unstructured form in the electronic health record. We investigated whether natural language processing (NLP) models can impute ECOG PS using unstructured note text.

MATERIALS AND METHODS

Medical oncology notes were identified from all patients with cancer at our center from 1997 to 2023 and divided at the patient level into training (approximately 80%), tuning/validation (approximately 10%), and test (approximately 10%) sets. Regular expressions were used to extract explicitly documented PS. Extracted PS labels were used to train NLP models to impute ECOG PS (0-1 2-4) from the remainder of the notes (with regular expression-extracted PS documentation removed). We assessed associations between imputed PS and overall survival (OS).

RESULTS

ECOG PS was extracted using regular expressions from 495,862 notes, corresponding to 79,698 patients. A Transformer-based Longformer model imputed PS with high discrimination (test set area under the receiver operating characteristic curve 0.95, area under the precision-recall curve 0.73). Imputed poor PS was associated with worse OS, including among notes with no explicit documentation of PS detected (OS hazard ratio, 11.9; 95% CI, 11.1 to 12.8).

CONCLUSION

NLP models can be used to impute performance status from unstructured oncologist notes at scale. This may aid the annotation of oncology data sets for clinical outcomes research and cancer care delivery.

摘要

目的

东部肿瘤协作组(ECOG)体能状态(PS)是癌症治疗和研究的关键临床变量,但它通常仅在电子健康记录中以非结构化形式记录。我们研究了自然语言处理(NLP)模型是否可以使用非结构化的笔记文本推断 ECOG PS。

材料与方法

从我们中心 1997 年至 2023 年所有癌症患者的医疗肿瘤学笔记中确定了患者,并在患者水平上分为训练(约 80%)、调整/验证(约 10%)和测试(约 10%)集。使用正则表达式从笔记中提取明确记录的 PS。提取的 PS 标签用于训练 NLP 模型,以从其余笔记中推断 ECOG PS(0-1 2-4)(去除正则表达式提取的 PS 文档)。我们评估了推断的 PS 与总生存(OS)之间的关联。

结果

使用正则表达式从 495,862 份笔记中提取了 ECOG PS,对应 79,698 名患者。基于 Transformer 的 Longformer 模型以高判别力推断 PS(测试集受试者工作特征曲线下面积 0.95,精确召回曲线下面积 0.73)。推断的不良 PS 与较差的 OS 相关,包括在未检测到明确 PS 记录的笔记中(OS 风险比,11.9;95%CI,11.1 至 12.8)。

结论

NLP 模型可用于从非结构化肿瘤学家笔记中大规模推断 PS。这可能有助于注释临床结果研究和癌症护理提供的肿瘤学数据集。

相似文献

3
Natural Language Processing of Clinical Documentation to Assess Functional Status in Patients With Heart Failure.
JAMA Netw Open. 2024 Nov 4;7(11):e2443925. doi: 10.1001/jamanetworkopen.2024.43925.
5
Are Current Survival Prediction Tools Useful When Treating Subsequent Skeletal-related Events From Bone Metastases?
Clin Orthop Relat Res. 2024 Sep 1;482(9):1710-1721. doi: 10.1097/CORR.0000000000003030. Epub 2024 Mar 22.
7
Automatically identifying social isolation from clinical narratives for patients with prostate Cancer.
BMC Med Inform Decis Mak. 2019 Mar 14;19(1):43. doi: 10.1186/s12911-019-0795-y.
8
Extraction of sleep information from clinical notes of Alzheimer's disease patients using natural language processing.
J Am Med Inform Assoc. 2024 Oct 1;31(10):2217-2227. doi: 10.1093/jamia/ocae177.
10
Taxane monotherapy regimens for the treatment of recurrent epithelial ovarian cancer.
Cochrane Database Syst Rev. 2022 Jul 12;7(7):CD008766. doi: 10.1002/14651858.CD008766.pub3.

引用本文的文献

本文引用的文献

1
Replacing performance status with a simple patient-reported outcome in palliative radiotherapy prognostic modelling.
Clin Transl Radiat Oncol. 2022 Oct 3;37:137-144. doi: 10.1016/j.ctro.2022.09.008. eCollection 2022 Nov.
3
Artificial intelligence-aided clinical annotation of a large multi-cancer genomic dataset.
Nat Commun. 2021 Dec 15;12(1):7304. doi: 10.1038/s41467-021-27358-6.
5
Natural Language Processing to Ascertain Cancer Outcomes From Medical Oncologist Notes.
JCO Clin Cancer Inform. 2020 Aug;4:680-690. doi: 10.1200/CCI.20.00020.
6
Inter-rater reliability in performance status assessment among healthcare professionals: an updated systematic review and meta-analysis.
Support Care Cancer. 2020 May;28(5):2071-2078. doi: 10.1007/s00520-019-05261-7. Epub 2020 Jan 3.
7
Assessment of Deep Natural Language Processing in Ascertaining Oncologic Outcomes From Radiology Reports.
JAMA Oncol. 2019 Oct 1;5(10):1421-1429. doi: 10.1001/jamaoncol.2019.1800.
8
Development and Validation of a High-Quality Composite Real-World Mortality Endpoint.
Health Serv Res. 2018 Dec;53(6):4460-4476. doi: 10.1111/1475-6773.12872. Epub 2018 May 14.
10
OncDRS: An integrative clinical and genomic data platform for enabling translational research and precision medicine.
Appl Transl Genom. 2015 Sep 14;6:18-25. doi: 10.1016/j.atg.2015.08.005. eCollection 2015 Sep.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验