Luo Zhipeng, He Yazhou, Xue Yanbing, Wang Hongjun, Hauskrecht Milos, Li Tianrui
School of Computing and Artificial Intelligence, Southwest Jiaotong University, Chengdu 611756, China.
Department of Oncology, West China School of Public Health and West China Fourth Hospital, Sichuan University, Chengdu 610041, China.
IEEE Trans Hum Mach Syst. 2023 Jun;53(3):581-589. doi: 10.1109/thms.2023.3252815. Epub 2023 Mar 23.
Learning classification models in practice usually requires numerous labeled data for training. However, instance-based annotation can be inefficient for humans to perform. In this article, we propose and study a new type of human supervision that is fast to perform and useful for model learning. Instead of labeling individual instances, humans provide supervision to data , which are subspaces of the input data space, representing subpopulations of data. Since labeling now is performed on a region level, 0/1 labeling becomes imprecise. Thus, we design the region label to be a assessment of the class proportion, which coarsely preserves the labeling precision but is also easy for humans to do. To identify informative regions for labeling and learning, we further devise a process that recursively constructs a region hierarchy. This process is semisupervised in the sense that it is driven by both active learning strategies and human expertise, where humans can provide discriminative features. To evaluate our framework, we conducted extensive experiments on nine datasets as well as a real user study on a survival analysis of colorectal cancer patients. The results have clearly demonstrated the superiority of our region-based active learning framework against many instance-based active learning methods.
在实际中学习分类模型通常需要大量的标注数据用于训练。然而,基于实例的标注对于人类来说可能效率低下。在本文中,我们提出并研究了一种新型的人工监督方式,它执行速度快且对模型学习有用。人类不再对单个实例进行标注,而是对数据的子空间进行监督,这些子空间是输入数据空间的子空间,代表数据的子群体。由于现在是在区域级别进行标注,0/1标注变得不精确。因此,我们将区域标签设计为对类别比例的一种评估,它大致保留了标注精度,但对人类来说也很容易操作。为了识别用于标注和学习的信息丰富的区域,我们进一步设计了一个递归构建区域层次结构的过程。这个过程在某种意义上是半监督的,因为它由主动学习策略和人类专业知识共同驱动,人类可以提供有区分性的特征。为了评估我们的框架,我们在九个数据集上进行了广泛的实验,并对结直肠癌患者的生存分析进行了一项实际用户研究。结果清楚地证明了我们基于区域的主动学习框架相对于许多基于实例的主动学习方法的优越性。