IEEE Trans Med Imaging. 2023 Aug;42(8):2348-2359. doi: 10.1109/TMI.2023.3248559. Epub 2023 Aug 1.
Leukemia classification relies on a detailed cytomorphological examination of Bone Marrow (BM) smear. However, applying existing deep-learning methods to it is facing two significant limitations. Firstly, these methods require large-scale datasets with expert annotations at the cell level for good results and typically suffer from poor generalization. Secondly, they simply treat the BM cytomorphological examination as a multi-class cell classification task, thus failing to exploit the correlation among leukemia subtypes over different hierarchies. Therefore, BM cytomorphological estimation as a time-consuming and repetitive process still needs to be done manually by experienced cytologists. Recently, Multi-Instance Learning (MIL) has achieved much progress in data-efficient medical image processing, which only requires patient-level labels (which can be extracted from the clinical reports). In this paper, we propose a hierarchical MIL framework and equip it with Information Bottleneck (IB) to tackle the above limitations. First, to handle the patient-level label, our hierarchical MIL framework uses attention-based learning to identify cells with high diagnostic values for leukemia classification in different hierarchies. Then, following the information bottleneck principle, we propose a hierarchical IB to constrain and refine the representations of different hierarchies for better accuracy and generalization. By applying our framework to a large-scale childhood acute leukemia dataset with corresponding BM smear images and clinical reports, we show that it can identify diagnostic-related cells without the need for cell-level annotations and outperforms other comparison methods. Furthermore, the evaluation conducted on an independent test cohort demonstrates the high generalizability of our framework.
白血病的分类依赖于对骨髓(BM)涂片的详细细胞形态学检查。然而,将现有的深度学习方法应用于此面临着两个重大限制。首先,这些方法需要具有细胞级专家注释的大规模数据集才能取得良好的效果,并且通常存在较差的泛化能力。其次,它们只是将 BM 细胞形态学检查视为多类细胞分类任务,因此无法利用不同层次的白血病亚型之间的相关性。因此,作为一个耗时且重复的过程,BM 细胞形态学评估仍然需要由经验丰富的细胞学家手动完成。最近,多实例学习(MIL)在数据高效的医学图像处理方面取得了很大进展,它只需要患者级别的标签(可以从临床报告中提取)。在本文中,我们提出了一个分层 MIL 框架,并为其配备了信息瓶颈(IB),以解决上述限制。首先,为了处理患者级别的标签,我们的分层 MIL 框架使用基于注意力的学习来识别不同层次中对白血病分类具有高诊断价值的细胞。然后,根据信息瓶颈原理,我们提出了一个分层 IB 来约束和细化不同层次的表示,以提高准确性和泛化能力。通过将我们的框架应用于具有相应 BM 涂片图像和临床报告的大规模儿童急性白血病数据集,我们表明它可以识别与诊断相关的细胞,而无需细胞级别的注释,并优于其他比较方法。此外,在独立测试队列上进行的评估表明了我们框架的高度通用性。