Bitto Abu Kowshir, Bijoy Md Hasan Imam, Shakil Kamrul Hassan, Das Aka, Biplob Khalid Been Badruzzaman, Mahmud Imran, Hossain Syed Md Minhaz
Department of Software Engineering, Daffodil International University, Dhaka 1216, Bangladesh.
Department of Computer Science and Engineering, Daffodil International University, Dhaka 1216, Bangladesh.
Data Brief. 2025 May 1;60:111572. doi: 10.1016/j.dib.2025.111572. eCollection 2025 Jun.
The gastrointestinal (GI) system is fundamental to human health, supporting digestion, nutrient absorption, and waste elimination. Disruptions in GI function, such as Gastroesophageal Reflux Disease (GERD) and gastrointestinal polyps, can lead to significant health complications if not diagnosed and managed early. However, manual interpretation of endoscopic images is time-consuming and prone to human error, highlighting the need for automated diagnostic tools. In this study, we introduce a comprehensive dataset of 24,036 high-quality endoscopic images, categorized into four classes: GERD, GERD Normal, Polyp, and Polyp Normal. This dataset is designed to facilitate research in automated detection and classification of these conditions through machine learning algorithms. The dataset consists of 4006 primary images collected following endoscopic procedures, which were augmented using six distinct techniques, expanding the total number of images to 24,036. It includes 5844 images of GERD cases (974primary images), 6618 images of GERD Normal (1103 primary images), 4674 images of Polyps (779 primary images), and 6900 images of Polyp Normal (1150 primary images). These images, pre-processed and resized to a resolution of 512 × 512 pixels, were obtained from Zainul Haque Sikder Women's Medical College & Hospital (Pvt.) Ltd. and saved in JPG format. This dataset addresses a critical gap in the availability of large, diverse, and well-labelled medical image datasets for training AI-driven healthcare solutions. It provides an invaluable resource for developing machine learning models aimed at the automatic diagnosis, classification, and detection of GERD and polyps, potentially improving the speed and accuracy of clinical decision-making. By leveraging this dataset, researchers can contribute to enhanced diagnostic tools that could significantly improve healthcare outcomes and patient quality of life in the field of gastroenterology.
胃肠道(GI)系统对人类健康至关重要,它支持消化、营养吸收和废物排泄。如果不早期诊断和处理,胃肠道功能紊乱,如胃食管反流病(GERD)和胃肠道息肉,可能会导致严重的健康并发症。然而,内镜图像的人工解读既耗时又容易出现人为错误,这凸显了对自动化诊断工具的需求。在本研究中,我们引入了一个包含24,036张高质量内镜图像的综合数据集,分为四类:GERD、GERD正常、息肉和息肉正常。该数据集旨在通过机器学习算法促进对这些病症的自动检测和分类研究。该数据集由4006张内镜检查后收集的原始图像组成,使用六种不同技术进行了扩充,使图像总数增加到24,036张。它包括5844张GERD病例图像(974张原始图像)、6618张GERD正常图像(1103张原始图像)、4674张息肉图像(779张原始图像)和6900张息肉正常图像(1150张原始图像)。这些图像经过预处理并调整为512×512像素的分辨率,取自Zainul Haque Sikder女子医学院及医院(私人)有限公司,并保存为JPG格式。该数据集填补了用于训练人工智能驱动的医疗保健解决方案的大型、多样且标注良好的医学图像数据集可用性方面的关键空白。它为开发旨在自动诊断、分类和检测GERD和息肉的机器学习模型提供了宝贵资源,有可能提高临床决策的速度和准确性。通过利用这个数据集,研究人员可以为增强诊断工具做出贡献,这可能会显著改善胃肠病学领域的医疗结果和患者生活质量。