Spanhol Fabio A, Oliveira Luiz S, Petitjean Caroline, Heutte Laurent
IEEE Trans Biomed Eng. 2016 Jul;63(7):1455-62. doi: 10.1109/TBME.2015.2496264. Epub 2015 Oct 30.
Today, medical image analysis papers require solid experiments to prove the usefulness of proposed methods. However, experiments are often performed on data selected by the researchers, which may come from different institutions, scanners, and populations. Different evaluation measures may be used, making it difficult to compare the methods. In this paper, we introduce a dataset of 7909 breast cancer histopathology images acquired on 82 patients, which is now publicly available from http://web.inf.ufpr.br/vri/breast-cancer-database. The dataset includes both benign and malignant images. The task associated with this dataset is the automated classification of these images in two classes, which would be a valuable computer-aided diagnosis tool for the clinician. In order to assess the difficulty of this task, we show some preliminary results obtained with state-of-the-art image classification systems. The accuracy ranges from 80% to 85%, showing room for improvement is left. By providing this dataset and a standardized evaluation protocol to the scientific community, we hope to gather researchers in both the medical and the machine learning field to advance toward this clinical application.
如今,医学图像分析论文需要可靠的实验来证明所提方法的有效性。然而,实验通常是在研究人员选择的数据上进行的,这些数据可能来自不同的机构、扫描仪和人群。可能会使用不同的评估方法,这使得比较这些方法变得困难。在本文中,我们介绍了一个包含82名患者的7909张乳腺癌组织病理学图像的数据集,该数据集现可从http://web.inf.ufpr.br/vri/breast-cancer-database公开获取。该数据集包括良性和恶性图像。与这个数据集相关的任务是将这些图像自动分类为两类,这对临床医生来说将是一个有价值的计算机辅助诊断工具。为了评估这项任务的难度,我们展示了一些使用最先进的图像分类系统获得的初步结果。准确率在80%到85%之间,表明仍有改进空间。通过向科学界提供这个数据集和一个标准化的评估协议,我们希望召集医学和机器学习领域的研究人员朝着这个临床应用迈进。