Suppr超能文献

利用自然语言处理技术从马来亚大学医学中心的叙述性病理报告中自动生成概要报告

Automated Generation of Synoptic Reports from Narrative Pathology Reports in University Malaya Medical Centre Using Natural Language Processing.

作者信息

Tan Wee-Ming, Teoh Kean-Hooi, Ganggayah Mogana Darshini, Taib Nur Aishah, Zaini Hana Salwani, Dhillon Sarinder Kaur

机构信息

Data Science & Bioinformatics Laboratory, Institute of Biological Sciences, Faculty of Science, University of Malaya, Kuala Lumpur 50603, Malaysia.

Laboratory Department, Sunway Medical Centre, Bandar Sunway 47500, Malaysia.

出版信息

Diagnostics (Basel). 2022 Apr 1;12(4):879. doi: 10.3390/diagnostics12040879.

Abstract

Pathology reports represent a primary source of information for cancer registries. University Malaya Medical Centre (UMMC) is a tertiary hospital responsible for training pathologists; thus narrative reporting becomes important. However, the unstructured free-text reports made the information extraction process tedious for clinical audits and data analysis-related research. This study aims to develop an automated natural language processing (NLP) algorithm to summarize the existing narrative breast pathology report from UMMC to a narrower structured synoptic pathology report with a checklist-style report template to ease the creation of pathology reports. The development of the rule-based NLP algorithm was based on the R programming language by using 593 pathology specimens from 174 patients provided by the Department of Pathology, UMMC. The pathologist provides specific keywords for data elements to define the semantic rules of the NLP. The system was evaluated by calculating the precision, recall, and F1-score. The proposed NLP algorithm achieved a micro-F1 score of 99.50% and a macro-F1 score of 98.97% on 178 specimens with 25 data elements. This achievement correlated to clinicians' needs, which could improve communication between pathologists and clinicians. The study presented here is significant, as structured data is easily minable and could generate important insights.

摘要

病理报告是癌症登记处的主要信息来源。马来亚大学医学中心(UMMC)是一家负责培训病理学家的三级医院;因此,叙述性报告变得很重要。然而,非结构化的自由文本报告使得临床审计和数据分析相关研究的信息提取过程变得繁琐。本研究旨在开发一种自动化自然语言处理(NLP)算法,将UMMC现有的叙述性乳腺病理报告总结为结构更紧凑的清单式报告模板的概要病理报告,以简化病理报告的创建。基于规则的NLP算法的开发基于R编程语言,使用了UMMC病理科提供的174名患者的593份病理标本。病理学家为数据元素提供特定的关键词,以定义NLP的语义规则。通过计算精确率、召回率和F1分数对该系统进行评估。所提出的NLP算法在包含25个数据元素的178份标本上实现了99.50%的微观F1分数和98.97%的宏观F1分数。这一成果符合临床医生的需求,能够改善病理学家和临床医生之间的沟通。这里介绍的研究具有重要意义,因为结构化数据易于挖掘,并且可以产生重要的见解。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/05b8/9027647/fd9de99dc9c2/diagnostics-12-00879-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验