Nguyen Anthony, Moore Julie, Lawley Michael, Hansen David, Colquist Shoni
The Australian E-Health Research Centre, CSIRO ICT Centre, Brisbane, Australia.
Stud Health Technol Inform. 2011;168:117-24.
To develop a system for the automatic classification of Cancer Registry notifications data from free-text pathology reports.
The underlying technology used for the extraction of cancer notification items is based on the symbolic rule-based classification methodology, whereby formal semantics are used to reason with the systematised nomenclature of medicine - clinical terms (SNOMED CT) concepts identified in the free text. Business rules for cancer notifications used by Cancer Registry coding staff were also incorporated with the aim to mimic Cancer Registry processes.
The system was developed on a corpus of 239 histology and cytology reports (with 60% notifiable reports), and then evaluated on an independent set of 300 reports (with 20% notifiable reports). Results show that the system can reliably classify notifiable reports with 96% and 100% specificity, and achieve an overall accuracy of 82% and 74% for classifying notification items from notifiable reports at a unit record level from the development and evaluation set, respectively.
Cancer Registries collect a multitude of data that requires manual review, slowing down the flow of information. Extracting and providing an automatically coded cancer pathology notification for review can lessen the reliance on expert clinical staff, improving the efficiency and availability of cancer information.
开发一个用于对来自自由文本病理报告的癌症登记通知数据进行自动分类的系统。
用于提取癌症通知项目的基础技术基于基于符号规则的分类方法,即使用形式语义对自由文本中识别出的医学系统化命名法——临床术语(SNOMED CT)概念进行推理。癌症登记编码人员使用的癌症通知业务规则也被纳入,旨在模拟癌症登记流程。
该系统基于239份组织学和细胞学报告(其中60%为应报告报告)的语料库开发,然后在一组独立的300份报告(其中20%为应报告报告)上进行评估。结果表明,该系统能够以96%和100%的特异性可靠地分类应报告报告,并且在开发集和评估集的单位记录级别上,对应报告报告中的通知项目进行分类的总体准确率分别为82%和74%。
癌症登记处收集大量需要人工审核的数据,这减缓了信息流。提取并提供自动编码的癌症病理通知以供审核可以减少对专家临床工作人员的依赖,提高癌症信息的效率和可用性。