Odisho Anobel Y, Bridge Mark, Webb Mitchell, Ameli Niloufar, Eapen Renu S, Stauf Frank, Cowan Janet E, Washington Samuel L, Herlemann Annika, Carroll Peter R, Cooperberg Matthew R
University of California, San Francisco, San Francisco, CA.
University of California, San Francisco Medical Center, San Francisco, CA.
JCO Clin Cancer Inform. 2019 Jul;3:1-8. doi: 10.1200/CCI.18.00084.
Cancer pathology findings are critical for many aspects of care but are often locked away as unstructured free text. Our objective was to develop a natural language processing (NLP) system to extract prostate pathology details from postoperative pathology reports and a parallel structured data entry process for use by urologists during routine documentation care and compare accuracy when compared with manual abstraction and concordance between NLP and clinician-entered approaches.
From February 2016, clinicians used note templates with custom structured data elements (SDEs) during routine clinical care for men with prostate cancer. We also developed an NLP algorithm to parse radical prostatectomy pathology reports and extract structured data. We compared accuracy of clinician-entered SDEs and NLP-parsed data to manual abstraction as a gold standard and compared concordance (Cohen's κ) between approaches assuming no gold standard.
There were 523 patients with NLP-extracted data, 319 with SDE data, and 555 with manually abstracted data. For Gleason scores, NLP and clinician SDE accuracy was 95.6% and 95.8%, respectively, compared with manual abstraction, with concordance of 0.93 (95% CI, 0.89 to 0.98). For margin status, extracapsular extension, and seminal vesicle invasion, stage, and lymph node status, NLP accuracy was 94.8% to 100%, SDE accuracy was 87.7% to 100%, and concordance between NLP and SDE ranged from 0.92 to 1.0.
We show that a real-world deployment of an NLP algorithm to extract pathology data and structured data entry by clinicians during routine clinical care in a busy clinical practice can generate accurate data when compared with manual abstraction for some, but not all, components of a prostate pathology report.
癌症病理检查结果在许多护理环节中至关重要,但往往以非结构化的自由文本形式封存。我们的目标是开发一种自然语言处理(NLP)系统,从术后病理报告中提取前列腺病理细节,并开发一个并行的结构化数据录入流程,供泌尿外科医生在常规文档护理中使用,并将其准确性与手动提取进行比较,以及比较NLP和临床医生录入方法之间的一致性。
从2016年2月起,临床医生在对前列腺癌男性患者进行常规临床护理时,使用带有自定义结构化数据元素(SDE)的笔记模板。我们还开发了一种NLP算法,用于解析根治性前列腺切除术病理报告并提取结构化数据。我们将临床医生录入的SDE数据和NLP解析的数据的准确性与作为金标准的手动提取进行比较,并在没有金标准的情况下比较两种方法之间的一致性(Cohen's κ)。
有523例患者有NLP提取的数据,319例有SDE数据,555例有手动提取的数据。对于 Gleason评分,与手动提取相比,NLP和临床医生SDE的准确性分别为95.6%和95.8%,一致性为0.93(95%CI,0.89至0.98)。对于切缘状态、包膜外侵犯、精囊侵犯、分期和淋巴结状态,NLP的准确性为94.8%至百分之百,SDE的准确性为87.7%至百分之百,NLP和SDE之间的一致性范围为0.92至1.0。
我们表明,在繁忙的临床实践中,在常规临床护理期间,实际部署NLP算法以提取病理数据和临床医生进行结构化数据录入,与手动提取前列腺病理报告的某些但并非所有组成部分相比,可以生成准确的数据。