Suppr超能文献

开发用于健康服务研究的自然语言处理引擎以生成膀胱癌病理数据。

Development of a Natural Language Processing Engine to Generate Bladder Cancer Pathology Data for Health Services Research.

作者信息

Schroeck Florian R, Patterson Olga V, Alba Patrick R, Pattison Erik A, Seigne John D, DuVall Scott L, Robertson Douglas J, Sirovich Brenda, Goodney Philip P

机构信息

VA Outcomes Group, White River Junction VA Medical Center, White River Junction, VT; Section of Urology, Dartmouth Hitchcock Medical Center, Lebanon, NH; Norris Cotton Cancer Center, Dartmouth Hitchcock Medical Center, Lebanon, NH; The Dartmouth Institute for Health Policy and Clinical Practice, Geisel School of Medicine at Dartmouth College, Hanover, NH.

Department of Internal Medicine, VA Salt Lake City Health Care System and University of Utah, Salt Lake City, UT.

出版信息

Urology. 2017 Dec;110:84-91. doi: 10.1016/j.urology.2017.07.056. Epub 2017 Sep 12.

Abstract

OBJECTIVE

To take the first step toward assembling population-based cohorts of patients with bladder cancer with longitudinal pathology data, we developed and validated a natural language processing (NLP) engine that abstracts pathology data from full-text pathology reports.

METHODS

Using 600 bladder pathology reports randomly selected from the Department of Veterans Affairs, we developed and validated an NLP engine to abstract data on histology, invasion (presence vs absence and depth), grade, the presence of muscularis propria, and the presence of carcinoma in situ. Our gold standard was based on an independent review of reports by 2 urologists, followed by adjudication. We assessed the NLP performance by calculating the accuracy, the positive predictive value, and the sensitivity. We subsequently applied the NLP engine to pathology reports from 10,725 patients with bladder cancer.

RESULTS

When comparing the NLP output to the gold standard, NLP achieved the highest accuracy (0.98) for the presence vs the absence of carcinoma in situ. Accuracy for histology, invasion (presence vs absence), grade, and the presence of muscularis propria ranged from 0.83 to 0.96. The most challenging variable was depth of invasion (accuracy 0.68), with an acceptable positive predictive value for lamina propria (0.82) and for muscularis propria (0.87) invasion. The validated engine was capable of abstracting pathologic characteristics for 99% of the patients with bladder cancer.

CONCLUSION

NLP had high accuracy for 5 of 6 variables and abstracted data for the vast majority of the patients. This now allows for the assembly of population-based cohorts with longitudinal pathology data.

摘要

目的

为朝着建立具有纵向病理数据的膀胱癌患者人群队列迈出第一步,我们开发并验证了一种自然语言处理(NLP)引擎,该引擎可从全文病理报告中提取病理数据。

方法

我们从退伍军人事务部随机选取600份膀胱病理报告,开发并验证了一个NLP引擎,以提取有关组织学、浸润(存在与否及深度)、分级、固有肌层的存在以及原位癌的存在等数据。我们的金标准基于两名泌尿科医生对报告的独立审查,随后进行裁定。我们通过计算准确率、阳性预测值和敏感性来评估NLP的性能。随后,我们将该NLP引擎应用于10725例膀胱癌患者的病理报告。

结果

将NLP输出与金标准进行比较时,NLP在原位癌存在与否方面的准确率最高(0.98)。组织学、浸润(存在与否)、分级以及固有肌层存在情况的准确率在0.83至0.96之间。最具挑战性的变量是浸润深度(准确率0.68),对于固有层浸润(0.82)和固有肌层浸润(0.87),其阳性预测值尚可接受。经过验证的引擎能够提取99%膀胱癌患者的病理特征。

结论

NLP在6个变量中的5个方面具有较高准确率,并且为绝大多数患者提取了数据。这现在使得能够建立具有纵向病理数据的人群队列。

相似文献

1
Development of a Natural Language Processing Engine to Generate Bladder Cancer Pathology Data for Health Services Research.
Urology. 2017 Dec;110:84-91. doi: 10.1016/j.urology.2017.07.056. Epub 2017 Sep 12.
6
Automating the Capture of Structured Pathology Data for Prostate Cancer Clinical Care and Research.
JCO Clin Cancer Inform. 2019 Jul;3:1-8. doi: 10.1200/CCI.18.00084.
7
Using Natural Language Processing to Automatically Identify Dysplasia in Pathology Reports for Patients With Barrett's Esophagus.
Clin Gastroenterol Hepatol. 2023 May;21(5):1198-1204. doi: 10.1016/j.cgh.2022.09.005. Epub 2022 Sep 15.
10
Natural language processing accurately categorizes findings from colonoscopy and pathology reports.
Clin Gastroenterol Hepatol. 2013 Jun;11(6):689-94. doi: 10.1016/j.cgh.2012.11.035. Epub 2013 Jan 11.

引用本文的文献

1
Non-Muscle Invasive Bladder Cancer: Many More Patients Die With It Than Of It.
Bladder Cancer. 2024 Jun 18;10(2):113-117. doi: 10.3233/BLC-230099. eCollection 2024.
3
Bioinformatics in urology - molecular characterization of pathophysiology and response to treatment.
Nat Rev Urol. 2024 Apr;21(4):214-242. doi: 10.1038/s41585-023-00805-3. Epub 2023 Aug 21.
4
PathologyBERT - Pre-trained Vs. A New Transformer Language Model for Pathology Domain.
AMIA Annu Symp Proc. 2023 Apr 29;2022:962-971. eCollection 2022.
7
Automatic Classification of Cancer Pathology Reports: A Systematic Review.
J Pathol Inform. 2022 Jan 20;13:100003. doi: 10.1016/j.jpi.2022.100003. eCollection 2022.
8
Improving natural language information extraction from cancer pathology reports using transfer learning and zero-shot string similarity.
JAMIA Open. 2021 Sep 30;4(3):ooab085. doi: 10.1093/jamiaopen/ooab085. eCollection 2021 Jul.
10
Natural language processing systems for pathology parsing in limited data environments with uncertainty estimation.
JAMIA Open. 2020 Oct 14;3(3):431-438. doi: 10.1093/jamiaopen/ooaa029. eCollection 2020 Oct.

本文引用的文献

1
Early Stage Bladder Cancer: Do Pathology Reports Tell Us What We Need to Know?
Urology. 2016 Dec;98:58-63. doi: 10.1016/j.urology.2016.07.040. Epub 2016 Aug 30.
2
EAU Guidelines on Non-Muscle-invasive Urothelial Carcinoma of the Bladder: Update 2016.
Eur Urol. 2017 Mar;71(3):447-461. doi: 10.1016/j.eururo.2016.05.041. Epub 2016 Jun 17.
5
Recurrence of high-risk bladder cancer: a population-based analysis.
Cancer. 2013 Sep 1;119(17):3219-27. doi: 10.1002/cncr.28147. Epub 2013 Jun 4.
6
Natural language processing accurately categorizes findings from colonoscopy and pathology reports.
Clin Gastroenterol Hepatol. 2013 Jun;11(6):689-94. doi: 10.1016/j.cgh.2012.11.035. Epub 2013 Jan 11.
8
Epidemiology and risk factors of urothelial bladder cancer.
Eur Urol. 2013 Feb;63(2):234-41. doi: 10.1016/j.eururo.2012.07.033. Epub 2012 Jul 25.
9
Follow-up after surgical treatment of bladder cancer: a critical analysis of the literature.
Eur Urol. 2012 Aug;62(2):290-302. doi: 10.1016/j.eururo.2012.05.008. Epub 2012 May 12.
10
Developing a natural language processing application for measuring the quality of colonoscopy procedures.
J Am Med Inform Assoc. 2011 Dec;18 Suppl 1(Suppl 1):i150-6. doi: 10.1136/amiajnl-2011-000431. Epub 2011 Sep 21.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验