使用自然语言处理技术从膀胱肿瘤经尿道切除术病理报告中自动提取分级、分期和质量信息

Automated Extraction of Grade, Stage, and Quality Information From Transurethral Resection of Bladder Tumor Pathology Reports Using Natural Language Processing.

作者信息

Glaser Alexander P, Jordan Brian J, Cohen Jason, Desai Anuj, Silberman Philip, Meeks Joshua J

机构信息

Alexander P. Glaser, Brian J. Jordan, Jason Cohen, Anuj Desai, Joshua J. Meeks, Feinberg School of Medicine, Northwestern University; Alexander P. Glaser, Brian J. Jordan, Joshua J. Meeks, Robert H. Lurie Comprehensive Cancer Center, Northwestern University; and Philip Silberman, Clinical and Translational Sciences Institute, Northwestern University, Chicago, IL.

出版信息

JCO Clin Cancer Inform. 2018 Dec;2:1-8. doi: 10.1200/CCI.17.00128.

DOI:10.1200/CCI.17.00128

PMID:30652586

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7010439/

Abstract

PURPOSE

Bladder cancer is initially diagnosed and staged with a transurethral resection of bladder tumor (TURBT). Patient survival is dependent on appropriate sampling of layers of the bladder, but pathology reports are dictated as free text, making large-scale data extraction for quality improvement challenging. We sought to automate extraction of stage, grade, and quality information from TURBT pathology reports using natural language processing (NLP).

METHODS

Patients undergoing TURBT were retrospectively identified using the Northwestern Enterprise Data Warehouse. An NLP algorithm was then created to extract information from free-text pathology reports and was iteratively improved using a training set of manually reviewed TURBTs. NLP accuracy was then validated using another set of manually reviewed TURBTs, and reliability was calculated using Cohen's κ.

RESULTS

Of 3,042 TURBTs identified from 2006 to 2016, 39% were classified as benign, 35% as Ta, 11% as T1, 4% as T2, and 10% as isolated carcinoma in situ. Of 500 randomly selected manually reviewed TURBTs, NLP correctly staged 88% of specimens (κ = 0.82; 95% CI, 0.78 to 0.86). Of 272 manually reviewed T1 tumors, NLP correctly categorized grade in 100% of tumors (κ = 1), correctly categorized if muscularis propria was reported by the pathologist in 98% of tumors (κ = 0.81; 95% CI, 0.62 to 0.99), and correctly categorized if muscularis propria was present or absent in the resection specimen in 82% of tumors (κ = 0.62; 95% CI, 0.55 to 0.73). Discrepancy analysis revealed pathologist notes and deeper resection specimens as frequent reasons for NLP misclassifications.

CONCLUSION

We developed an NLP algorithm that demonstrates a high degree of reliability in extracting stage, grade, and presence of muscularis propria from TURBT pathology reports. Future iterations can continue to improve performance, but automated extraction of oncologic information is promising in improving quality and assisting physicians in delivery of care.

摘要

目的

膀胱癌最初通过经尿道膀胱肿瘤切除术（TURBT）进行诊断和分期。患者的生存率取决于对膀胱各层的适当取材，但病理报告以自由文本形式呈现，这使得为质量改进进行大规模数据提取具有挑战性。我们试图使用自然语言处理（NLP）技术自动从TURBT病理报告中提取分期、分级和质量信息。

方法

利用西北企业数据仓库回顾性识别接受TURBT的患者。然后创建了一种NLP算法，从自由文本病理报告中提取信息，并使用一组人工审核的TURBT训练集进行迭代改进。随后使用另一组人工审核的TURBT验证NLP的准确性，并使用科恩κ系数计算可靠性。

结果

在2006年至2016年识别出的3042例TURBT中，39%被分类为良性，35%为Ta期，11%为T1期，4%为T2期，10%为孤立性原位癌。在随机选择的500例人工审核的TURBT中，NLP正确分期了88%的标本（κ = 0.82；95% CI，0.78至0.86）。在272例人工审核的T1期肿瘤中，NLP对100%的肿瘤正确分级（κ = 1），在98%的肿瘤中正确分类病理学家是否报告了固有肌层（κ = 0.81；95% CI，0.62至0.99），在82%的肿瘤中正确分类切除标本中是否存在固有肌层（κ = 0.62；95% CI，0.55至0.

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

使用自然语言处理技术从膀胱肿瘤经尿道切除术病理报告中自动提取分级、分期和质量信息

Automated Extraction of Grade, Stage, and Quality Information From Transurethral Resection of Bladder Tumor Pathology Reports Using Natural Language Processing.

作者信息

机构信息

出版信息

PURPOSE

METHODS

RESULTS

CONCLUSION

目的

方法

结果

相似文献

引用本文的文献

本文引用的文献

使用自然语言处理技术从膀胱肿瘤经尿道切除术病理报告中自动提取分级、分期和质量信息

Automated Extraction of Grade, Stage, and Quality Information From Transurethral Resection of Bladder Tumor Pathology Reports Using Natural Language Processing.

作者信息

机构信息

出版信息

PURPOSE

METHODS

RESULTS

CONCLUSION

目的

方法

结果

相似文献

引用本文的文献

本文引用的文献