评估自然语言处理模型以识别和描述美国具有高危非肌肉浸润性膀胱癌的患者。

Evaluation of a Natural Language Processing Model to Identify and Characterize Patients in the United States With High-Risk Non-Muscle-Invasive Bladder Cancer.

机构信息

Emory University School of Medicine, Grady Memorial Hospital, Atlanta, GA.

Weill Cornell Medical College, New York, NY.

出版信息

JCO Clin Cancer Inform. 2023 Sep;7:e2300096. doi: 10.1200/CCI.23.00096.

DOI:10.1200/CCI.23.00096

PMID:37906722

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10642898/

Abstract

PURPOSE

Treatment of non-muscle-invasive bladder cancer (NMIBC) is guided by risk stratification using clinical and pathologic criteria. This study aimed to develop a natural language processing (NLP) model for identifying patients with high-risk NMIBC retrospectively from unstructured electronic medical records (EMRs) and to apply the model to describe patient and tumor characteristics.

METHODS

We used three independent EMR-derived data sets including adult patients with a bladder cancer diagnosis in 2011-2020 for NLP model development and training (n = 140), validation (n = 697), and application for the retrospective cohort analysis (n = 4,402). Deep learning methods were used to train NLP recognition of medical chart terminology to identify seven high-risk NMIBC criteria; model performance was assessed using the F1 score, weighted across features. An algorithm was then used to classify each patient as high-risk NMIBC (yes/no). Manually reviewed records served as the gold standard.

RESULTS

The F1 scores after model training were >0.7 for all but one uncommon feature (prostatic urethral involvement). The highest area under the receiver operating curves (AUC) was observed for Ta (0.897) and T1 (0.897); the lowest AUC was for carcinoma in situ (CIS; 0.617). For high-risk NMIBC classification, positive predictive value was 79.4%, negative predictive value was 93.2%, and false-positive rate was 8.9%. Sensitivity and specificity were 83.7% and 91.1%, respectively. Of 748 patients manually confirmed as having high-risk NMIBC, 196 (26%) had CIS (of whom 19% also had T1 and 23% also had Ta disease); 552 tumors (74%) had no associated CIS.

CONCLUSION

The NLP model, combined with a rule-based algorithm, identified high-risk NMIBC with good performance and will enable future work to study real-world treatment patterns and clinical outcomes for high-risk NMIBC.

摘要

目的

非肌层浸润性膀胱癌（NMIBC）的治疗是通过临床和病理标准进行风险分层指导的。本研究旨在开发一种自然语言处理（NLP）模型，以便从非结构化电子病历（EMR）中回顾性地识别出高危 NMIBC 患者，并应用该模型描述患者和肿瘤特征。

方法

我们使用了三个独立的 EMR 衍生数据集，包括 2011 年至 2020 年期间患有膀胱癌的成年患者，用于 NLP 模型的开发和训练（n=140）、验证（n=697）以及应用于回顾性队列分析（n=4402）。我们使用深度学习方法来训练 NLP 识别医疗图表术语，以识别出七个高危 NMIBC 标准；使用 F1 评分来评估模型性能，该评分在特征之间进行加权。然后使用算法将每个患者分类为高危 NMIBC（是/否）。经手动审查的记录作为金标准。

结果

除了一个不太常见的特征（前列腺尿道受累）外，所有模型训练后的 F1 评分均>0.7。Ta（0.897）和 T1（0.897）的接收者操作曲线下面积（AUC）最高；CIS（0.617）的 AUC 最低。对于高危 NMIBC 分类，阳性预测值为 79.4%，阴性预测值为 93.2%，假阳性率为 8.9%。敏感性和特异性分别为 83.7%和 91.1%。在 748 名经手动确认患有高危 NMIBC 的患者中，196 名（26%）患有 CIS（其中 19%同时患有 T1，23%同时患有 Ta 疾病）；552 个肿瘤（74%）没有相关的 CIS。

结论

该 NLP 模型结合基于规则的算法，可以很好地识别高危 NMIBC，这将使未来能够研究高危 NMIBC 的真实治疗模式和临床结局。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

评估自然语言处理模型以识别和描述美国具有高危非肌肉浸润性膀胱癌的患者。

Evaluation of a Natural Language Processing Model to Identify and Characterize Patients in the United States With High-Risk Non-Muscle-Invasive Bladder Cancer.

机构信息

出版信息

PURPOSE

METHODS

RESULTS

CONCLUSION

目的

方法

结果

结论

相似文献

本文引用的文献

评估自然语言处理模型以识别和描述美国具有高危非肌肉浸润性膀胱癌的患者。

Evaluation of a Natural Language Processing Model to Identify and Characterize Patients in the United States With High-Risk Non-Muscle-Invasive Bladder Cancer.

机构信息

出版信息

PURPOSE

METHODS

RESULTS

CONCLUSION

目的

方法

结果

结论

相似文献

本文引用的文献