Mukherjee Ratri, Jha Kishlay
Department of Electrical and Computer Engineering University of Iowa, IA, USA.
Proceedings (IEEE Int Conf Bioinformatics Biomed). 2024 Dec;2024:3611-3614. doi: 10.1109/bibm62325.2024.10822585.
Biomedical text classification refers to the task of annotating a biomedical text with its relevant labels from a candidate label set. Most of the existing approach operate in a fully supervised setting and thus heavily rely on human-annotated training data which is both labor-intensive and monetarily expensive. To address this, we propose to formulate biomedical text classification under the zero-shot learning (ZSL) paradigm that does not require any labeled training data and only relies on label surface names for training and inference. Specifically, we propose a new context-aware contrastive learning technique for ZSL that fully exploits the context information present in the biomedical text to generate semantically enriched feature representations needed for accurate zero-shot biomedical text classification. Unlike existing contrastive learning approaches that typically employ random text segmentation strategies to generate contrastive pairs, our approach utilizes the context information inherently present in biomedical text to generate semantically meaningful contrastive pairs. Extensive experiments on the largest available biomedical corpus validates the effectiveness of the proposed approach.
生物医学文本分类是指从候选标签集中为生物医学文本标注相关标签的任务。现有的大多数方法都在完全监督的环境下运行,因此严重依赖人工标注的训练数据,这既耗费人力又成本高昂。为了解决这个问题,我们建议在零样本学习(ZSL)范式下进行生物医学文本分类,该范式不需要任何有标签的训练数据,仅依赖标签表面名称进行训练和推理。具体而言,我们为ZSL提出了一种新的上下文感知对比学习技术,该技术充分利用生物医学文本中存在的上下文信息,以生成准确的零样本生物医学文本分类所需的语义丰富的特征表示。与现有的通常采用随机文本分割策略来生成对比对的对比学习方法不同,我们的方法利用生物医学文本中固有的上下文信息来生成语义有意义的对比对。在最大可用生物医学语料库上进行的大量实验验证了所提出方法的有效性。