Department of Data Science and Visualization, Faculty of Informatics, University of Debrecen, Debrecen, Hungary.
Department of Pathology, Kenezy Gyula University Hospital and Clinic, University of Debrecen, Debrecen, Hungary.
Sci Data. 2024 Jul 7;11(1):743. doi: 10.1038/s41597-024-03596-3.
Machine learning-based systems have become instrumental in augmenting global efforts to combat cervical cancer. A burgeoning area of research focuses on leveraging artificial intelligence to enhance the cervical screening process, primarily through the exhaustive examination of Pap smears, traditionally reliant on the meticulous and labor-intensive analysis conducted by specialized experts. Despite the existence of some comprehensive and readily accessible datasets, the field is presently constrained by the limited volume of publicly available images and smears. As a remedy, our work unveils APACC (Annotated PAp cell images and smear slices for Cell Classification), a comprehensive dataset designed to bridge this gap. The APACC dataset features a remarkable array of images crucial for advancing research in this field. It comprises 103,675 annotated cell images, carefully extracted from 107 whole smears, which are further divided into 21,371 sub-regions for a more refined analysis. This dataset includes a vast number of cell images from conventional Pap smears and their specific locations on each smear, offering a valuable resource for in-depth investigation and study.
基于机器学习的系统已成为增强全球抗击宫颈癌努力的重要手段。一个新兴的研究领域专注于利用人工智能来增强宫颈筛查过程,主要是通过对巴氏涂片进行详尽的检查,传统上依赖于专门专家进行细致且劳动密集型的分析。尽管存在一些全面且易于获取的数据集,但该领域目前受到公共可用图像和涂片数量有限的限制。为了解决这个问题,我们的工作揭示了 APACC(用于细胞分类的标注巴氏涂片细胞图像和涂片切片),这是一个旨在弥合这一差距的综合数据集。APACC 数据集具有一系列对于推进该领域研究至关重要的图像。它包含 103675 张经过注释的细胞图像,这些图像是从 107 张完整涂片仔细提取出来的,进一步分为 21371 个子区域,以便进行更精细的分析。该数据集包括大量来自传统巴氏涂片的细胞图像及其在每个涂片上的特定位置,为深入调查和研究提供了宝贵的资源。