Aubreville Marc, Bertram Christof A, Donovan Taryn A, Marzahl Christian, Maier Andreas, Klopfleisch Robert
Pattern Recognition Lab, Computer Science, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany.
Institute of Veterinary Pathology, Freie Universität Berlin, Berlin, Germany.
Sci Data. 2020 Nov 27;7(1):417. doi: 10.1038/s41597-020-00756-z.
Canine mammary carcinoma (CMC) has been used as a model to investigate the pathogenesis of human breast cancer and the same grading scheme is commonly used to assess tumor malignancy in both. One key component of this grading scheme is the density of mitotic figures (MF). Current publicly available datasets on human breast cancer only provide annotations for small subsets of whole slide images (WSIs). We present a novel dataset of 21 WSIs of CMC completely annotated for MF. For this, a pathologist screened all WSIs for potential MF and structures with a similar appearance. A second expert blindly assigned labels, and for non-matching labels, a third expert assigned the final labels. Additionally, we used machine learning to identify previously undetected MF. Finally, we performed representation learning and two-dimensional projection to further increase the consistency of the annotations. Our dataset consists of 13,907 MF and 36,379 hard negatives. We achieved a mean F1-score of 0.791 on the test set and of up to 0.696 on a human breast cancer dataset.
犬乳腺肿瘤(CMC)已被用作研究人类乳腺癌发病机制的模型,并且通常使用相同的分级方案来评估两者的肿瘤恶性程度。该分级方案的一个关键组成部分是有丝分裂象(MF)的密度。当前公开可用的人类乳腺癌数据集仅为全切片图像(WSIs)的小子集提供注释。我们展示了一个包含21个CMC全切片图像的新数据集,这些图像已针对MF进行了完全注释。为此,一名病理学家对所有全切片图像进行筛查,以寻找潜在的有丝分裂象和外观相似的结构。另一位专家盲目地分配标签,对于不匹配的标签,第三位专家分配最终标签。此外,我们使用机器学习来识别先前未检测到的有丝分裂象。最后,我们进行了表征学习和二维投影,以进一步提高注释的一致性。我们的数据集包含13,907个有丝分裂象和36,379个硬阴性样本。我们在测试集上的平均F1分数为0.791,在人类乳腺癌数据集上高达0.696。