Graduate School of Health Sciences, Hokkaido University, Sapporo, Japan.
Graduate School of Medicine, Hokkaido University, Sapporo, Japan.
Sci Rep. 2022 Oct 6;12(1):16736. doi: 10.1038/s41598-022-20651-4.
Differential bone marrow (BM) cell counting is an important test for the diagnosis of various hematological diseases. However, it is difficult to accurately classify BM cells due to non-uniformity and the lack of reproducibility of differential counting. Therefore, automatic classification systems have been developed in which deep learning is used. These systems requires large and accurately labeled datasets for training. To overcome this, we used semi-supervised learning (SSL), in which learning proceeds while labeling. We used three methods: self-training (ST), active learning (AL), and a combination of these methods, and attempted to automatically classify 16 types of BM cell images. ST involves data verification, as in AL, before adding them to the training dataset (confirmed self-training: CST). After 25 rounds of CST, AL, and CST + AL, the initial number of training data increased from 425 to 40,518; 3682; and 47,843, respectively. Accuracies for the test data of 50 images for each cell type were 0.944, 0.941, and 0.976, respectively. Data added with CST or AL showed some imbalances between classes, while CST + AL exhibited fewer imbalances. We suggest that CST + AL, when combined with two SSL methods, is efficient in increasing training data for the development of automatic BM cells classification systems.
骨髓(BM)细胞差异计数是诊断各种血液疾病的重要检验。然而,由于差异计数的不均匀性和可重复性差,因此难以准确分类 BM 细胞。因此,已经开发了使用深度学习的自动分类系统。这些系统需要大量准确标记的数据集进行训练。为了克服这一问题,我们使用了半监督学习(SSL),在学习的同时进行标记。我们使用了三种方法:自训练(ST)、主动学习(AL)以及这些方法的组合,并尝试自动分类 16 种 BM 细胞图像。ST 涉及数据验证,就像在 AL 中一样,然后再将其添加到训练数据集(确认自训练:CST)中。在 25 轮 CST、AL 和 CST+AL 之后,初始训练数据量分别从 425 增加到 40,518、3682 和 47,843。对于每种细胞类型的 50 个图像的测试数据,准确性分别为 0.944、0.941 和 0.976。使用 CST 或 AL 添加的数据在某些类别之间存在不平衡,而 CST+AL 则表现出较少的不平衡。我们建议,当与两种 SSL 方法结合使用时,CST+AL 可以有效地增加训练数据,从而开发自动 BM 细胞分类系统。