Institute of Transport and Territory, Universitat Politècnica de València, Valencia, Spain.
Department of Computer Science and Artificial Intelligence, University of Granada, Granada, Spain.
Comput Biol Med. 2022 Aug;147:105714. doi: 10.1016/j.compbiomed.2022.105714. Epub 2022 Jun 10.
Multiple instance learning (MIL) deals with data grouped into bags of instances, of which only the global information is known. In recent years, this weakly supervised learning paradigm has become very popular in histological image analysis because it alleviates the burden of labeling all cancerous regions of large Whole Slide Images (WSIs) in detail. However, these methods require large datasets to perform properly, and many approaches only focus on simple binary classification. This often does not match the real-world problems where multi-label settings are frequent and possible constraints must be taken into account. In this work, we propose a novel multi-label MIL formulation based on inequality constraints that is able to incorporate prior knowledge about instance proportions. Our method has a theoretical foundation in optimization with log-barrier extensions, applied to bag-level class proportions. This encourages the model to respect the proportion ordering during training. Extensive experiments on a new public dataset of prostate cancer WSIs analysis, SICAP-MIL, demonstrate that using the prior proportion information we can achieve instance-level results similar to supervised methods on datasets of similar size. In comparison with prior MIL settings, our method allows for ∼13% improvements in instance-level accuracy, and ∼3% in the multi-label mean area under the ROC curve at the bag-level.
多示例学习(MIL)处理的数据被分为实例包,其中只知道全局信息。近年来,这种弱监督学习范式在组织学图像分析中变得非常流行,因为它减轻了详细标记大全幻灯片图像(WSI)中所有癌性区域的负担。然而,这些方法需要大型数据集才能正常运行,并且许多方法仅专注于简单的二进制分类。这通常与现实世界的问题不匹配,在这些问题中,多标签设置很常见,并且必须考虑到可能的约束。在这项工作中,我们提出了一种基于不等式约束的新的多标签 MIL 公式,能够合并关于实例比例的先验知识。我们的方法在优化中具有理论基础,使用对数障碍扩展,应用于袋级别的类比例。这鼓励模型在训练过程中尊重比例排序。在新的前列腺癌 WSI 分析公共数据集 SICAP-MIL 上进行的广泛实验表明,使用先验比例信息,我们可以在类似大小的数据集上实现类似于监督方法的实例级结果。与先前的 MIL 设置相比,我们的方法允许在实例级精度上提高约 13%,在袋级别的多标签 ROC 曲线下面积上提高约 3%。