Suppr超能文献

儿科胸部 X 线片数据集(PediCXR):用于解读儿童常见胸部疾病的公开、大规模胸部 X 线数据集。

PediCXR: An open, large-scale chest radiograph dataset for interpretation of common thoracic diseases in children.

机构信息

Smart Health Center, VinBigData JSC, Hanoi, Vietnam.

College of Engineering & Computer Science, VinUniversity, Hanoi, Vietnam.

出版信息

Sci Data. 2023 Apr 27;10(1):240. doi: 10.1038/s41597-023-02102-5.

Abstract

Computer-aided diagnosis systems in adult chest radiography (CXR) have recently achieved great success thanks to the availability of large-scale, annotated datasets and the advent of high-performance supervised learning algorithms. However, the development of diagnostic models for detecting and diagnosing pediatric diseases in CXR scans is undertaken due to the lack of high-quality physician-annotated datasets. To overcome this challenge, we introduce and release PediCXR, a new pediatric CXR dataset of 9,125 studies retrospectively collected from a major pediatric hospital in Vietnam between 2020 and 2021. Each scan was manually annotated by a pediatric radiologist with more than ten years of experience. The dataset was labeled for the presence of 36 critical findings and 15 diseases. In particular, each abnormal finding was identified via a rectangle bounding box on the image. To the best of our knowledge, this is the first and largest pediatric CXR dataset containing lesion-level annotations and image-level labels for the detection of multiple findings and diseases. For algorithm development, the dataset was divided into a training set of 7,728 and a test set of 1,397. To encourage new advances in pediatric CXR interpretation using data-driven approaches, we provide a detailed description of the PediCXR data sample and make the dataset publicly available on https://physionet.org/content/vindr-pcxr/1.0.0/ .

摘要

计算机辅助诊断系统在成人胸部 X 光摄影(CXR)中最近取得了巨大成功,这要归功于大规模、带注释的数据集的可用性和高性能监督学习算法的出现。然而,由于缺乏高质量的医师注释数据集,因此开展了用于在 CXR 扫描中检测和诊断儿科疾病的诊断模型的开发。为了克服这一挑战,我们引入并发布了 PediCXR,这是一个新的儿科 CXR 数据集,其中包含 9125 项研究,这些研究是从 2020 年至 2021 年期间越南一家主要儿科医院回顾性收集的。每个扫描都由一位具有十多年经验的儿科放射科医生手动注释。该数据集的标签包含 36 种关键发现和 15 种疾病。特别是,每个异常发现都通过图像上的矩形边界框进行标识。据我们所知,这是第一个也是最大的儿科 CXR 数据集,其中包含病变级别的注释和图像级别的标签,用于检测多种发现和疾病。为了进行算法开发,该数据集分为训练集 7728 个和测试集 1397 个。为了鼓励使用数据驱动方法在儿科 CXR 解释方面取得新的进展,我们详细描述了 PediCXR 数据样本,并在 https://physionet.org/content/vindr-pcxr/1.0.0/ 上公开提供了数据集。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b203/10133237/641201fb754c/41597_2023_2102_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验