Nguyen Anthony N, Moore Julie, O'Dwyer John, Philpot Shoni
The Australian e-Health Research Centre, CSIRO, Brisbane, Australia.
Queensland Cancer Control Analysis Team, Queensland Health, Brisbane, Australia.
AMIA Annu Symp Proc. 2015 Nov 5;2015:953-62. eCollection 2015.
Cancer Registries record cancer data by reading and interpreting pathology cancer specimen reports. For some Registries this can be a manual process, which is labour and time intensive and subject to errors. A system for automatic extraction of cancer data from HL7 electronic free-text pathology reports has been proposed to improve the workflow efficiency of the Cancer Registry. The system is currently processing an incoming trickle feed of HL7 electronic pathology reports from across the state of Queensland in Australia to produce an electronic cancer notification. Natural language processing and symbolic reasoning using SNOMED CT were adopted in the system; Queensland Cancer Registry business rules were also incorporated. A set of 220 unseen pathology reports selected from patients with a range of cancers was used to evaluate the performance of the system. The system achieved overall recall of 0.78, precision of 0.83 and F-measure of 0.80 over seven categories, namely, basis of diagnosis (3 classes), primary site (66 classes), laterality (5 classes), histological type (94 classes), histological grade (7 classes), metastasis site (19 classes) and metastatic status (2 classes). These results are encouraging given the large cross-section of cancers. The system allows for the provision of clinical coding support as well as indicative statistics on the current state of cancer, which is not otherwise available.
癌症登记处通过读取和解读病理癌症标本报告来记录癌症数据。对于一些登记处来说,这可能是一个人工流程,既耗费人力又耗时,而且容易出错。有人提出了一种从HL7电子自由文本病理报告中自动提取癌症数据的系统,以提高癌症登记处的工作流程效率。该系统目前正在处理来自澳大利亚昆士兰州各地的HL7电子病理报告的少量传入数据,以生成电子癌症通知。该系统采用了使用SNOMED CT的自然语言处理和符号推理;昆士兰癌症登记处的业务规则也被纳入其中。从一系列癌症患者中选出的一组220份未见过的病理报告被用来评估该系统的性能。该系统在七个类别上的总体召回率为0.78,精确率为0.83,F值为0.80,这七个类别分别是诊断依据(3类)、原发部位(66类)、侧别(5类)、组织学类型(94类)、组织学分级(7类)、转移部位(19类)和转移状态(2类)。考虑到癌症的广泛范围,这些结果令人鼓舞。该系统能够提供临床编码支持以及关于癌症现状的指示性统计数据,而这些数据在其他情况下是无法获得的。