Welch E Celeste, Lu Chenhao, Sung C James, Zhang Cunxian, Tripathi Anubhav, Ou Joyce
Center for Biomedical Engineering, School of Engineering, Brown University, Providence, RI, 02912, USA.
Department of Computer Science, Brown University, Providence, RI, 02912, USA.
Sci Data. 2024 Dec 28;11(1):1444. doi: 10.1038/s41597-024-04328-3.
In the past several years, a few cervical Pap smear datasets have been published for use in clinical training. However, most publicly available datasets consist of pre-segmented single cell images, contain on-image annotations that must be manually edited out, or are prepared using the conventional Pap smear method. Multicellular liquid Pap image datasets are a more accurate reflection of current cervical screening techniques. While a multicellular liquid SurePath™ dataset has been created, machine learning models struggle to classify a test image set when it is prepared differently from the training set due to visual differences. Therefore, this dataset of multicellular Pap smear images prepared with the more common ThinPrep® protocol is presented as a helpful resource for training and testing artificial intelligence models, particularly for future application in cervical dysplasia diagnosis. The "Brown Multicellular ThinPrep" (BMT) dataset is the first publicly available multicellular ThinPrep® dataset, consisting of 600 clinically vetted images collected from 180 Pap smear slides from 180 patients, classified into three key diagnostic categories.
在过去几年中,已经发布了一些用于临床培训的宫颈巴氏涂片数据集。然而,大多数公开可用的数据集由预分割的单细胞图像组成,包含必须手动编辑掉的图像注释,或者是使用传统巴氏涂片方法制备的。多细胞液体巴氏图像数据集更准确地反映了当前的宫颈筛查技术。虽然已经创建了一个多细胞液体SurePath™数据集,但由于视觉差异,当测试图像集的制备方式与训练集不同时,机器学习模型难以对其进行分类。因此,这个用更常见的ThinPrep®协议制备的多细胞巴氏涂片图像数据集被作为训练和测试人工智能模型的有用资源呈现出来,特别是用于未来宫颈发育异常诊断的应用。“布朗多细胞ThinPrep”(BMT)数据集是第一个公开可用的多细胞ThinPrep®数据集,由从180名患者的180张巴氏涂片载玻片中收集的600张经过临床审核的图像组成,分为三个关键诊断类别。