Suppr超能文献

利用胸部X光图像进行COVID-19检测的半监督深度学习中处理分布不匹配问题:一种使用特征密度的新方法。

Dealing with distribution mismatch in semi-supervised deep learning for COVID-19 detection using chest X-ray images: A novel approach using feature densities.

作者信息

Calderon-Ramirez Saul, Yang Shengxiang, Elizondo David, Moemeni Armaghan

机构信息

Institute of Artificial Intelligence (IAI), School of Computer Science and Informatics, De Montfort University, United Kingdom.

Instituto Tecnologico de Costa Rica, Costa Rica.

出版信息

Appl Soft Comput. 2022 Jul;123:108983. doi: 10.1016/j.asoc.2022.108983. Epub 2022 May 10.

Abstract

In the context of the global coronavirus pandemic, different deep learning solutions for infected subject detection using chest X-ray images have been proposed. However, deep learning models usually need large labelled datasets to be effective. Semi-supervised deep learning is an attractive alternative, where unlabelled data is leveraged to improve the overall model's accuracy. However, in real-world usage settings, an unlabelled dataset might present a different distribution than the labelled dataset (i.e. the labelled dataset was sampled from a clinic and the unlabelled dataset from a clinic). This results in a distribution mismatch between the unlabelled and labelled datasets. In this work, we assess the impact of the distribution mismatch between the labelled and the unlabelled datasets, for a semi-supervised model trained with chest X-ray images, for COVID-19 detection. Under strong distribution mismatch conditions, we found an accuracy hit of almost 30%, suggesting that the unlabelled dataset distribution has a strong influence in the behaviour of the model. Therefore, we propose a straightforward approach to diminish the impact of such distribution mismatch. Our proposed method uses a density approximation of the feature space. It is built upon the target dataset to filter out the observations in the source unlabelled dataset that might harm the accuracy of the semi-supervised model. It assumes that a small labelled source dataset is available together with a larger source unlabelled dataset. Our proposed method does not require any model training, it is simple and computationally cheap. We compare our proposed method against two popular state of the art data detectors, which are also cheap and simple to implement. In our tests, our method yielded accuracy gains of up to 32%, when compared to the previous state of the art methods. The good results yielded by our method leads us to argue in favour for a more data-centric approach to improve model's accuracy. Furthermore, the developed method can be used to measure data effectiveness for semi-supervised deep learning model training.

摘要

在全球冠状病毒大流行的背景下,人们提出了不同的利用胸部X光图像进行感染对象检测的深度学习解决方案。然而,深度学习模型通常需要大量带标签的数据集才能有效。半监督深度学习是一种有吸引力的替代方法,它利用未标记的数据来提高整体模型的准确性。然而,在实际使用场景中,未标记的数据集可能呈现出与标记数据集不同的分布(即标记数据集是从一个诊所采样的,而未标记数据集是从另一个诊所采样的)。这导致了未标记数据集和标记数据集之间的分布不匹配。在这项工作中,我们评估了标记数据集和未标记数据集之间的分布不匹配对使用胸部X光图像训练的用于COVID-19检测的半监督模型的影响。在强烈的分布不匹配条件下,我们发现准确率下降了近30%,这表明未标记数据集的分布对模型的行为有很大影响。因此,我们提出了一种直接的方法来减少这种分布不匹配的影响。我们提出的方法使用特征空间的密度近似。它基于目标数据集构建,以过滤掉源未标记数据集中可能损害半监督模型准确性的观测值。它假设一个小的带标记源数据集与一个更大的源未标记数据集一起可用。我们提出的方法不需要任何模型训练,简单且计算成本低。我们将我们提出的方法与两种流行的先进数据检测器进行比较,这两种检测器也很便宜且易于实现。在我们的测试中,与之前的先进方法相比,我们的方法准确率提高了高达32%。我们的方法产生的良好结果使我们主张采用一种更以数据为中心的方法来提高模型的准确性。此外,所开发的方法可用于测量半监督深度学习模型训练的数据有效性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/164d/9085448/4342683c078c/gr1_lrg.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验