School of Information Technologies, University of Sydney, Sydney, New South Wales, Australia.
J Am Med Inform Assoc. 2014 Sep-Oct;21(5):893-901. doi: 10.1136/amiajnl-2013-002516. Epub 2014 May 22.
This paper presents an automated system for classifying the results of imaging examinations (CT, MRI, positron emission tomography) into reportable and non-reportable cancer cases. This system is part of an industrial-strength processing pipeline built to extract content from radiology reports for use in the Victorian Cancer Registry.
In addition to traditional supervised learning methods such as conditional random fields and support vector machines, active learning (AL) approaches were investigated to optimize training production and further improve classification performance. The project involved two pilot sites in Victoria, Australia (Lake Imaging (Ballarat) and Peter MacCallum Cancer Centre (Melbourne)) and, in collaboration with the NSW Central Registry, one pilot site at Westmead Hospital (Sydney).
The reportability classifier performance achieved 98.25% sensitivity and 96.14% specificity on the cancer registry's held-out test set. Up to 92% of training data needed for supervised machine learning can be saved by AL.
AL is a promising method for optimizing the supervised training production used in classification of radiology reports. When an AL strategy is applied during the data selection process, the cost of manual classification can be reduced significantly.
The most important practical application of the reportability classifier is that it can dramatically reduce human effort in identifying relevant reports from the large imaging pool for further investigation of cancer. The classifier is built on a large real-world dataset and can achieve high performance in filtering relevant reports to support cancer registries.
本文提出了一种自动系统,用于将影像学检查(CT、MRI、正电子发射断层扫描)的结果分类为可报告和不可报告的癌症病例。该系统是一个工业强度处理管道的一部分,该管道用于从放射学报告中提取内容,供维多利亚癌症登记处使用。
除了条件随机场和支持向量机等传统监督学习方法外,还研究了主动学习(AL)方法,以优化训练生产并进一步提高分类性能。该项目涉及澳大利亚维多利亚州的两个试点站点(Lake Imaging(巴拉腊特)和 Peter MacCallum Cancer Centre(墨尔本)),并与新南威尔士州中央登记处合作,在 Westmead 医院(悉尼)进行了一个试点。
可报告性分类器在癌症登记处的保留测试集上实现了 98.25%的敏感性和 96.14%的特异性。主动学习可节省高达 92%的监督机器学习所需的训练数据。
主动学习是优化分类放射学报告中使用的监督训练生产的一种很有前途的方法。当在数据选择过程中应用 AL 策略时,可以显著减少手动分类的成本。
可报告性分类器的最重要实际应用是,它可以大大减少从大型成像池中识别相关报告以进一步调查癌症的人力。该分类器建立在一个大型真实数据集上,可以实现高性能的相关报告筛选,以支持癌症登记处。