利用各医院的标记和未标记数据进行临床文档分类

Clinical Document Classification Using Labeled and Unlabeled Data Across Hospitals.

作者信息

Hassanzadeh Hamed, Kholghi Mahnoosh, Nguyen Anthony, Chu Kevin

机构信息

Australian e-Health Research Centre, CSIRO, Brisbane, QLD, Australia.

Royal Brisbane andWomens Hospital, Brisbane, QLD, Australia.

出版信息

AMIA Annu Symp Proc. 2018 Dec 5;2018:545-554. eCollection 2018.

PMID:30815095

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6371298/

Abstract

Reviewing radiology reports in emergency departments is an essential but laborious task. Timely follow-up of patients with abnormal cases in their radiology reports may dramatically affect the patient's outcome, especially if they have been discharged with a different initial diagnosis. Machine learning approaches have been devised to expedite the process and detect the cases that demand instant follow up. However, these approaches require a large amount of labeled data to train reliable predictive models. Preparing such a large dataset, which needs to be manually annotated by health professionals, is costly and time-consuming. This paper investigates a semi-supervised transfer learning framework for radiology report classification across three hospitals. The main goal is to leverage both vastly available clinical unlabeled data and already learned knowledge in order to improve a learning model where limited labeled data is available. Our experimental findings show that (1) convolutional neural networks (CNNs), while being independent of any problem-specific feature engineering, achieve significantly higher effectiveness compared to conventional supervised learning approaches, (2) leveraging unlabeled data in training a CNN-based classifier reduces the dependency on labeled data by more than 50% to reach the same performance of a fully supervised CNN, and (3) transferring the knowledge gained from available labeled data in an external source hospital significantly improves the performance of a semi-supervised CNN model over their fully supervised counterparts in a target hospital.

摘要

在急诊科查看放射学报告是一项重要但艰巨的任务。对放射学报告中有异常情况的患者进行及时随访可能会显著影响患者的治疗结果，尤其是当他们出院时的初步诊断不同时。已经设计了机器学习方法来加快这一过程并检测需要立即随访的病例。然而，这些方法需要大量的标记数据来训练可靠的预测模型。准备这样一个需要由卫生专业人员手动注释的大型数据集既昂贵又耗时。本文研究了一种用于三家医院放射学报告分类的半监督迁移学习框架。主要目标是利用大量可用的临床未标记数据和已学到的知识，以改进在标记数据有限的情况下的学习模型。我们的实验结果表明：（1）卷积神经网络（CNN）虽然独立于任何特定问题的特征工程，但与传统的监督学习方法相比，具有显著更高的有效性；（2）在训练基于CNN的分类器时利用未标记数据可将对标记数据的依赖性降低50%以上，以达到与完全监督的CNN相同的性能；（3）在外部源医院中转移从可用标记数据中获得的知识，可显著提高半监督CNN模型在目标医院中的性能，优于其完全监督的对应模型。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

利用各医院的标记和未标记数据进行临床文档分类

Clinical Document Classification Using Labeled and Unlabeled Data Across Hospitals.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

相似文献

引用本文的文献

本文引用的文献

利用各医院的标记和未标记数据进行临床文档分类

Clinical Document Classification Using Labeled and Unlabeled Data Across Hospitals.

作者信息

机构信息

出版信息