自然语言处理在监测美国国立癌症研究所监测、流行病学和最终结果（SEER）项目中IV期非小细胞肺癌病例的[具体内容缺失]和检测结果方面的有效性。

Validity of Natural Language Processing for Ascertainment of and Test Results in SEER Cases of Stage IV Non-Small-Cell Lung Cancer.

作者信息

Goulart Bernardo Haddock Lobo, Silgard Emily T, Baik Christina S, Bansal Aasthaa, Sun Qin, Durbin Eric B, Hands Isaac, Shah Darshil, Arnold Susanne M, Ramsey Scott D, Kavuluru Ramakanth, Schwartz Stephen M

机构信息

Fred Hutchinson Cancer Research Center, Seattle, WA.

University of Washington, Seattle, WA.

出版信息

JCO Clin Cancer Inform. 2019 May;3:1-15. doi: 10.1200/CCI.18.00098.

DOI:10.1200/CCI.18.00098

PMID:31058542

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6874053/

Abstract

PURPOSE

SEER registries do not report results of epidermal growth factor receptor () and anaplastic lymphoma kinase () mutation tests. To facilitate population-based research in molecularly defined subgroups of non-small-cell lung cancer (NSCLC), we assessed the validity of natural language processing (NLP) for the ascertainment of EGFR and ALK testing from electronic pathology (e-path) reports of NSCLC cases included in two SEER registries: the Cancer Surveillance System (CSS) and the Kentucky Cancer Registry (KCR).

METHODS

We obtained 4,278 e-path reports from 1,634 patients who were diagnosed with stage IV nonsquamous NSCLC from September 1, 2011, to December 31, 2013, included in CSS. We used 855 CSS reports to train NLP systems for the ascertainment of and test status (reported not reported) and test results (positive negative). We assessed sensitivity, specificity, and positive and negative predictive values in an internal validation sample of 3,423 CSS e-path reports and repeated the analysis in an external sample of 1,041 e-path reports from 565 KCR patients. Two oncologists manually reviewed all e-path reports to generate gold-standard data sets.

RESULTS

NLP systems yielded internal validity metrics that ranged from 0.95 to 1.00 for and test status and results in CSS e-path reports. NLP showed high internal accuracy for the ascertainment of and in CSS patients-F scores of 0.95 and 0.96, respectively. In the external validation analysis, NLP yielded metrics that ranged from 0.02 to 0.96 in KCR reports and F scores of 0.70 and 0.72, respectively, in KCR patients.

CONCLUSION

NLP is an internally valid method for the ascertainment of and test information from e-path reports available in SEER registries, but future work is necessary to increase NLP external validity.

摘要

目的

监测、流行病学与最终结果（SEER）登记处未报告表皮生长因子受体（EGFR）和间变性淋巴瘤激酶（ALK）突变检测结果。为推动基于人群的非小细胞肺癌（NSCLC）分子定义亚组研究，我们评估了自然语言处理（NLP）从两个SEER登记处（癌症监测系统（CSS）和肯塔基癌症登记处（KCR））纳入的NSCLC病例的电子病理（e-path）报告中确定EGFR和ALK检测的有效性。

方法

我们从2011年9月1日至2013年12月31日被诊断为IV期非鳞状NSCLC的1634例患者中获取了4278份e-path报告，这些患者纳入了CSS。我们使用855份CSS报告来训练NLP系统，以确定EGFR和ALK检测状态（报告/未报告）和检测结果（阳性/阴性）。我们在3423份CSS e-path报告的内部验证样本中评估了敏感性、特异性以及阳性和阴性预测值，并在来自565例KCR患者的1041份e-path报告的外部样本中重复了分析。两名肿瘤学家人工审查了所有e-path报告以生成金标准数据集。

结果

NLP系统在CSS e-path报告中得出的EGFR和ALK检测状态及结果的内部有效性指标范围为0.95至1.00。NLP在CSS患者中确定EGFR和ALK时显示出较高的内部准确性——F分数分别为0.95和0.96。在外部验证分析中，NLP在KCR报告中得出的指标范围为0.02至0.96，在KCR患者中的F分数分别为0.70和0.72。

结论

NLP是从SEER登记处可用的e-path报告中确定EGFR和ALK检测信息的一种内部有效的方法，但未来有必要开展工作以提高NLP的外部有效性。

相似文献

Validity of Natural Language Processing for Ascertainment of and Test Results in SEER Cases of Stage IV Non-Small-Cell Lung Cancer.自然语言处理在监测美国国立癌症研究所监测、流行病学和最终结果（SEER）项目中IV期非小细胞肺癌病例的[具体内容缺失]和检测结果方面的有效性。

JCO Clin Cancer Inform. 2019 May;3:1-15. doi: 10.1200/CCI.18.00098.

Value of serum tumor markers for predicting EGFR mutations and positive ALK expression in 1089 Chinese non-small-cell lung cancer patients: A retrospective analysis.1089 例中国非小细胞肺癌患者血清肿瘤标志物预测 EGFR 突变和 ALK 阳性表达的价值：一项回顾性分析。

Eur J Cancer. 2020 Jan;124:1-14. doi: 10.1016/j.ejca.2019.10.005. Epub 2019 Nov 7.

Applying computer text mining algorithms for oversampling tumor mutation status in medical records for the NCI Patterns of Care studies.应用计算机文本挖掘算法对 NCI 护理模式研究中的医疗记录进行肿瘤突变状态的过采样。

Int J Med Inform. 2023 Sep;177:105157. doi: 10.1016/j.ijmedinf.2023.105157. Epub 2023 Jul 17.

Molecular Testing for EGFR Mutations and ALK Rearrangements in the Cytological Specimens From the Patients With Non-Small Cell Lung Cancer.非小细胞肺癌患者细胞学标本中表皮生长因子受体（EGFR）突变和间变性淋巴瘤激酶（ALK）重排的分子检测

Appl Immunohistochem Mol Morphol. 2019 Feb;27(2):119-124. doi: 10.1097/PAI.0000000000000701.

CT texture analysis for prediction of EGFR mutational status and ALK rearrangement in patients with non-small cell lung cancer.CT 纹理分析预测非小细胞肺癌患者 EGFR 突变状态和 ALK 重排。

Radiol Med. 2021 Jun;126(6):786-794. doi: 10.1007/s11547-020-01323-7. Epub 2021 Jan 29.

Association of molecular status and metastatic organs at diagnosis in patients with stage IV non-squamous non-small cell lung cancer.IV 期非鳞状非小细胞肺癌患者的分子状态与诊断时转移器官的相关性。

Lung Cancer. 2018 Jul;121:76-81. doi: 10.1016/j.lungcan.2018.05.006. Epub 2018 May 15.

Clinical and computed tomography characteristics of non-small cell lung cancer with ALK gene rearrangement: Comparison with EGFR mutation and ALK/EGFR-negative lung cancer.非小细胞肺癌中 ALK 基因重排的临床和计算机断层扫描特征：与 EGFR 突变和 ALK/EGFR 阴性肺癌的比较。

Thorac Cancer. 2019 Apr;10(4):872-879. doi: 10.1111/1759-7714.13017. Epub 2019 Feb 27.

Low frequency of mutation of epidermal growth factor receptor (EGFR) and arrangement of anaplastic lymphoma kinase (ALK) in primary pulmonary lymphoepithelioma-like carcinoma.原发性肺淋巴上皮瘤样癌中表皮生长因子受体（EGFR）突变和间变性淋巴瘤激酶（ALK）排列的低频性。

Thorac Cancer. 2020 Feb;11(2):346-352. doi: 10.1111/1759-7714.13271. Epub 2019 Dec 3.

Impact of EGFR mutation and ALK rearrangement on the outcomes of non-small cell lung cancer patients with brain metastasis.表皮生长因子受体突变和间变性淋巴瘤激酶重排对非小细胞肺癌脑转移患者结局的影响。

Neuro Oncol. 2020 Feb 20;22(2):267-277. doi: 10.1093/neuonc/noz155.

Investigation of EGFR and ALK mutation frequency and treatment results in advanced non-small cell lung cancer.探讨晚期非小细胞肺癌中 EGFR 和 ALK 突变频率及治疗结果。

J Cancer Res Ther. 2023 Apr;19(Supplement):S183-S190. doi: 10.4103/jcrt.JCRT_1766_20.

引用本文的文献

ImpACT Project: Improving Access to Clinical Trials in Victoria, an Artificial Intelligence-Based Approach.ImpACT项目：采用基于人工智能的方法改善维多利亚州的临床试验可及性。

JCO Clin Cancer Inform. 2025 Jan;9:e2400137. doi: 10.1200/CCI.24.00137. Epub 2025 Jan 9.

Natural language processing in urology: Automated extraction of clinical information from histopathology reports of uro-oncology procedures.泌尿外科中的自然语言处理：从泌尿肿瘤手术组织病理学报告中自动提取临床信息

Heliyon. 2023 Mar 24;9(4):e14793. doi: 10.1016/j.heliyon.2023.e14793. eCollection 2023 Apr.

Assessment of Electronic Health Record for Cancer Research and Patient Care Through a Scoping Review of Cancer Natural Language Processing.通过癌症自然语言处理的范围综述评估癌症研究和患者护理的电子健康记录。

JCO Clin Cancer Inform. 2022 Jul;6:e2200006. doi: 10.1200/CCI.22.00006.

Informatics Methods and Infrastructure Needed to Study Factors Associated with High Incidence of Pediatric Brain and Central Nervous System Tumors in Kentucky.研究与肯塔基州儿童脑和中枢神经系统肿瘤高发相关因素所需的信息学方法和基础设施。

J Registry Manag. 2020 Fall;47(3):127-134.

Out-of-Pocket Costs for Tyrosine Kinase Inhibitors and Patient Outcomes in - and -Positive Advanced Non-Small-Cell Lung Cancer.酪氨酸激酶抑制剂的自付费用与阳性晚期非小细胞肺癌患者的结局。

JCO Oncol Pract. 2021 Feb;17(2):e130-e139. doi: 10.1200/OP.20.00692. Epub 2020 Dec 7.

Cross-registry neural domain adaptation to extract mutational test results from pathology reports.跨注册域神经域自适应从病理报告中提取突变测试结果。

J Biomed Inform. 2019 Sep;97:103267. doi: 10.1016/j.jbi.2019.103267. Epub 2019 Aug 8.

本文引用的文献

ALK Testing Trends and Patterns Among Community Practices in the United States.美国社区医疗机构中ALK检测的趋势与模式

JCO Precis Oncol. 2018 Nov;2:1-11. doi: 10.1200/PO.18.00159.

Comparison of Natural Language Processing and Manual Coding for the Identification of Cross-Sectional Imaging Reports Suspicious for Lung Cancer.用于识别可疑肺癌横断面影像报告的自然语言处理与人工编码的比较

JCO Clin Cancer Inform. 2018 Dec;2:1-7. doi: 10.1200/CCI.17.00069.

Molecular Testing Guideline for the Selection of Lung Cancer Patients for Treatment With Targeted Tyrosine Kinase Inhibitors: American Society of Clinical Oncology Endorsement Summary of the College of American Pathologists/International Association for the Study of Lung Cancer/Association for Molecular Pathology Clinical Practice Guideline Update.用于选择接受靶向酪氨酸激酶抑制剂治疗的肺癌患者的分子检测指南：美国临床肿瘤学会对美国病理学家学会/国际肺癌研究协会/分子病理学协会临床实践指南更新的认可总结

J Oncol Pract. 2018 May;14(5):323-327. doi: 10.1200/JOP.18.00035. Epub 2018 Mar 28.

Molecular Testing Guideline for the Selection of Patients With Lung Cancer for Treatment With Targeted Tyrosine Kinase Inhibitors: American Society of Clinical Oncology Endorsement of the College of American Pathologists/International Association for the Study of Lung Cancer/Association for Molecular Pathology Clinical Practice Guideline Update.分子检测指南：选择肺癌患者接受靶向酪氨酸激酶抑制剂治疗：美国临床肿瘤学会对美国病理学家学院/国际肺癌研究协会/分子病理学会临床实践指南更新的认可。

J Clin Oncol. 2018 Mar 20;36(9):911-919. doi: 10.1200/JCO.2017.76.7293. Epub 2018 Feb 5.

Real-world first-line treatment and overall survival in non-small cell lung cancer without known EGFR mutations or ALK rearrangements in US community oncology setting.美国社区肿瘤环境中无已知表皮生长因子受体（EGFR）突变或间变性淋巴瘤激酶（ALK）重排的非小细胞肺癌的真实世界一线治疗及总生存期

PLoS One. 2017 Jun 23;12(6):e0178420. doi: 10.1371/journal.pone.0178420. eCollection 2017.

Pembrolizumab versus Chemotherapy for PD-L1-Positive Non-Small-Cell Lung Cancer.帕博利珠单抗对比化疗用于 PD-L1 阳性非小细胞肺癌。

N Engl J Med. 2016 Nov 10;375(19):1823-1833. doi: 10.1056/NEJMoa1606774. Epub 2016 Oct 8.

Natural Language Processing in Oncology: A Review.自然语言处理在肿瘤学中的应用：综述

JAMA Oncol. 2016 Jun 1;2(6):797-804. doi: 10.1001/jamaoncol.2016.0213.

Opportunities and challenges in leveraging electronic health record data in oncology.利用电子健康记录数据在肿瘤学领域面临的机遇与挑战。

Future Oncol. 2016 May;12(10):1261-74. doi: 10.2217/fon-2015-0043. Epub 2016 Mar 8.

NCCN Guidelines Insights: Non-Small Cell Lung Cancer, Version 4.2016.NCCN指南解读：非小细胞肺癌，2016年第4版

J Natl Compr Canc Netw. 2016 Mar;14(3):255-64. doi: 10.6004/jnccn.2016.0031.

Big data analytics in healthcare: promise and potential.医疗保健中的大数据分析：前景与潜力。

Health Inf Sci Syst. 2014 Feb 7;2:3. doi: 10.1186/2047-2501-2-3. eCollection 2014.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验