Wieneke Arika E, Bowles Erin J A, Cronkite David, Wernli Karen J, Gao Hongyuan, Carrell David, Buist Diana S M
Group Health Research Institute, Seattle, WA, USA.
J Pathol Inform. 2015 Jun 23;6:38. doi: 10.4103/2153-3539.159215. eCollection 2015.
Pathology reports typically require manual review to abstract research data. We developed a natural language processing (NLP) system to automatically interpret free-text breast pathology reports with limited assistance from manual abstraction.
We used an iterative approach of machine learning algorithms and constructed groups of related findings to identify breast-related procedures and results from free-text pathology reports. We evaluated the NLP system using an all-or-nothing approach to determine which reports could be processed entirely using NLP and which reports needed manual review beyond NLP. We divided 3234 reports for development (2910, 90%), and evaluation (324, 10%) purposes using manually reviewed pathology data as our gold standard.
NLP correctly coded 12.7% of the evaluation set, flagged 49.1% of reports for manual review, incorrectly coded 30.8%, and correctly omitted 7.4% from the evaluation set due to irrelevancy (i.e. not breast-related). Common procedures and results were identified correctly (e.g. invasive ductal with 95.5% precision and 94.0% sensitivity), but entire reports were flagged for manual review because of rare findings and substantial variation in pathology report text.
The NLP system we developed did not perform sufficiently for abstracting entire breast pathology reports. The all-or-nothing approach resulted in too broad of a scope of work and limited our flexibility to identify breast pathology procedures and results. Our NLP system was also limited by the lack of the gold standard data on rare findings and wide variation in pathology text. Focusing on individual, common elements and improving pathology text report standardization may improve performance.
病理报告通常需要人工审阅以提取研究数据。我们开发了一种自然语言处理(NLP)系统,以在有限的人工提取辅助下自动解读自由文本乳腺病理报告。
我们采用机器学习算法的迭代方法,并构建相关发现组,以从自由文本病理报告中识别乳腺相关手术及结果。我们使用全有或全无的方法评估NLP系统,以确定哪些报告可完全通过NLP处理,哪些报告除NLP外还需要人工审阅。我们将3234份报告分为用于开发(2910份,90%)和评估(324份,10%)的两组,将人工审阅的病理数据作为我们的金标准。
NLP正确编码了评估集的12.7%,标记49.1%的报告进行人工审阅,错误编码30.8%,并因不相关(即与乳腺无关)而正确地从评估集中排除7.4%。常见的手术和结果被正确识别(例如,浸润性导管癌的精确率为95.5%,敏感度为94.0%),但由于罕见发现和病理报告文本的显著差异,整个报告被标记为需要人工审阅。
我们开发的NLP系统在提取整个乳腺病理报告方面表现不佳。全有或全无的方法导致工作范围过宽,限制了我们识别乳腺病理手术和结果的灵活性。我们的NLP系统还受到罕见发现缺乏金标准数据以及病理文本差异较大的限制。关注个体、常见元素并提高病理文本报告的标准化可能会提高性能。