• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用通用肿瘤自然语言处理模型验证非小细胞肺癌临床见解。

Validation of Non-Small Cell Lung Cancer Clinical Insights Using a Generalized Oncology Natural Language Processing Model.

机构信息

Optum Insight, Optum, Eden Prairie, MN.

Departments of Neurology and Population Health, New York University Grossman School of Medicine, New York, NY.

出版信息

JCO Clin Cancer Inform. 2024 Sep;8:e2300099. doi: 10.1200/CCI.23.00099.

DOI:10.1200/CCI.23.00099
PMID:39230200
Abstract

PURPOSE

Limited studies have used natural language processing (NLP) in the context of non-small cell lung cancer (NSCLC). This study aimed to validate the application of an NLP model to an NSCLC cohort by extracting NSCLC concepts from free-text medical notes and converting them to structured, interpretable data.

METHODS

Patients with a lung neoplasm, NSCLC histology, and treatment information in their notes were selected from a repository of over 27 million patients. From these, 200 were randomly selected for this study with the longest and the most recent note included for each patient. An NLP model developed and validated on a large solid and blood cancer oncology cohort was applied to this NSCLC cohort. Two certified tumor registrars and a curator abstracted concepts from the notes: neoplasm, histology, stage, TNM values, and metastasis sites. This manually abstracted gold standard was compared with the NLP model output. Precision and recall scores were calculated.

RESULTS

The NLP model extracted the NSCLC concepts with excellent precision and recall with the following scores, respectively: Lung neoplasm 100% and 100%, NSCLC histology 99% and 88%, histology correctly linked to neoplasm 98% and 79%, stage value 98.8% and 92%, stage TNM value 93% and 98%, and metastasis site 97% and 89%. High precision is related to a low number of false positives, and therefore, extracted concepts are likely accurate. High recall indicates that the model captured most of the desired concepts.

CONCLUSION

This study validates that Optum's oncology NLP model has high precision and recall with clinical real-world data and is a reliable model to support research studies and clinical trials. This validation study shows that our nonspecific solid tumor and blood cancer oncology model is generalizable to successfully extract clinical information from specific cancer cohorts.

摘要

目的

已有少量研究在非小细胞肺癌(NSCLC)背景下使用自然语言处理(NLP)。本研究旨在通过从医疗记录的自由文本中提取 NSCLC 概念并将其转换为结构化、可解释的数据,验证 NLP 模型在 NSCLC 队列中的应用。

方法

从超过 2700 万患者的存储库中选择记录有肺部肿瘤、NSCLC 组织学和治疗信息的患者。在此基础上,本研究随机选择了 200 名患者,每位患者都包含最长和最近的记录。将在大型实体瘤和血液肿瘤队列上开发和验证的 NLP 模型应用于该 NSCLC 队列。两位经过认证的肿瘤登记员和一位管理员从记录中提取概念:肿瘤、组织学、分期、TNM 值和转移部位。将此手动提取的金标准与 NLP 模型输出进行比较。计算精度和召回率得分。

结果

NLP 模型以优异的精度和召回率提取 NSCLC 概念,分别为:肺部肿瘤 100%和 100%、非小细胞肺癌组织学 99%和 88%、组织学正确链接到肿瘤 98%和 79%、分期值 98.8%和 92%、分期 TNM 值 93%和 98%、转移部位 97%和 89%。高精确度与低假阳性率有关,因此提取的概念很可能是准确的。高召回率表明模型捕获了大部分所需的概念。

结论

本研究验证了 Optum 的肿瘤 NLP 模型在临床真实世界数据中具有高精度和高召回率,是支持研究和临床试验的可靠模型。该验证研究表明,我们的非特异性实体瘤和血液肿瘤肿瘤学模型具有通用性,可以成功地从特定癌症队列中提取临床信息。

相似文献

1
Validation of Non-Small Cell Lung Cancer Clinical Insights Using a Generalized Oncology Natural Language Processing Model.利用通用肿瘤自然语言处理模型验证非小细胞肺癌临床见解。
JCO Clin Cancer Inform. 2024 Sep;8:e2300099. doi: 10.1200/CCI.23.00099.
2
Decoding Recurrence in Early-Stage and Locoregionally Advanced Non-Small Cell Lung Cancer: Insights From Electronic Health Records and Natural Language Processing.解读早期及局部晚期非小细胞肺癌的复发情况:来自电子健康记录和自然语言处理的见解
JCO Clin Cancer Inform. 2025 Apr;9:e2400227. doi: 10.1200/CCI-24-00227. Epub 2025 Apr 18.
3
A Multi-Institutional Natural Language Processing Pipeline to Extract Performance Status From Electronic Health Records.多机构自然语言处理管道从电子健康记录中提取表现状态。
Cancer Control. 2024 Jan-Dec;31:10732748241279518. doi: 10.1177/10732748241279518.
4
Assessing the feasibility and external validity of natural language processing-extracted data for advanced lung cancer patients.
Lung Cancer. 2025 Jan;199:108080. doi: 10.1016/j.lungcan.2025.108080. Epub 2025 Jan 4.
5
Automated SNOMED CT concept and attribute relationship detection through a web-based implementation of cTAKES.通过基于网络的cTAKES实现自动检测SNOMED CT概念与属性关系
J Biomed Semantics. 2019 Sep 18;10(1):14. doi: 10.1186/s13326-019-0207-3.
6
Development of a generalizable natural language processing pipeline to extract physician-reported pain from clinical reports: Generated using publicly-available datasets and tested on institutional clinical reports for cancer patients with bone metastases.开发一种可推广的自然语言处理管道,从临床报告中提取医生报告的疼痛:使用公开可用的数据集生成,并在患有骨转移的癌症患者的机构临床报告上进行测试。
J Biomed Inform. 2021 Aug;120:103864. doi: 10.1016/j.jbi.2021.103864. Epub 2021 Jul 12.
7
A Question-and-Answer System to Extract Data From Free-Text Oncological Pathology Reports (CancerBERT Network): Development Study.从自由文本肿瘤病理学报告(CancerBERT 网络)中提取数据的问答系统:开发研究。
J Med Internet Res. 2022 Mar 23;24(3):e27210. doi: 10.2196/27210.
8
Validity of Natural Language Processing for Ascertainment of and Test Results in SEER Cases of Stage IV Non-Small-Cell Lung Cancer.自然语言处理在监测美国国立癌症研究所监测、流行病学和最终结果(SEER)项目中IV期非小细胞肺癌病例的[具体内容缺失]和检测结果方面的有效性。
JCO Clin Cancer Inform. 2019 May;3:1-15. doi: 10.1200/CCI.18.00098.
9
Text analysis framework for identifying mutations among non-small cell lung cancer patients from laboratory data.用于从实验室数据中识别非小细胞肺癌患者基因突变的文本分析框架。
BMC Med Res Methodol. 2024 Mar 11;24(1):63. doi: 10.1186/s12874-024-02192-8.
10
Identifying Patient-Reported Outcome Measure Documentation in Veterans Health Administration Chiropractic Clinic Notes: Natural Language Processing Analysis.识别退伍军人健康管理局脊椎按摩诊所记录中的患者报告结局测量文档:自然语言处理分析
JMIR Med Inform. 2025 Apr 2;13:e66466. doi: 10.2196/66466.