Hospital Israelita Albert Einstein - Big Data Analytics, São Paulo, Brazil.
Hospital Israelita Albert Einstein - Imaging Department, São Paulo, Brazil.
Sci Data. 2022 Aug 10;9(1):487. doi: 10.1038/s41597-022-01608-8.
Chest radiographs allow for the meticulous examination of a patient's chest but demands specialized training for proper interpretation. Automated analysis of medical imaging has become increasingly accessible with the advent of machine learning (ML) algorithms. Large labeled datasets are key elements for training and validation of these ML solutions. In this paper we describe the Brazilian labeled chest x-ray dataset, BRAX: an automatically labeled dataset designed to assist researchers in the validation of ML models. The dataset contains 24,959 chest radiography studies from patients presenting to a large general Brazilian hospital. A total of 40,967 images are available in the BRAX dataset. All images have been verified by trained radiologists and de-identified to protect patient privacy. Fourteen labels were derived from free-text radiology reports written in Brazilian Portuguese using Natural Language Processing.
胸部 X 光片可以对患者的胸部进行细致的检查,但需要专门的培训才能正确解读。随着机器学习 (ML) 算法的出现,医学影像的自动分析变得越来越容易。大型标记数据集是训练和验证这些 ML 解决方案的关键要素。在本文中,我们描述了巴西标记的胸部 X 射线数据集 BRAX:一个自动标记的数据集,旨在帮助研究人员验证 ML 模型。该数据集包含了来自巴西一家大型综合医院就诊的 24959 例胸部 X 光检查。BRAX 数据集中共有 40967 张图像。所有图像均经过训练有素的放射科医生验证,并进行去识别处理,以保护患者隐私。从使用巴西葡萄牙语编写的放射学报告的自由文本中提取了 14 个标签。