Suppr超能文献

基于众包的序列标注的序贯标注建模。

Modeling Sequential Annotations for Sequence Labeling With Crowds.

出版信息

IEEE Trans Cybern. 2023 Apr;53(4):2335-2345. doi: 10.1109/TCYB.2021.3117700. Epub 2023 Mar 16.

Abstract

Crowd sequential annotations can be an efficient and cost-effective way to build large datasets for sequence labeling. Different from tagging independent instances, for crowd sequential annotations, the quality of label sequence relies on the expertise level of annotators in capturing internal dependencies for each token in the sequence. In this article, we propose modeling sequential annotation for sequence labeling with crowds (SA-SLC). First, a conditional probabilistic model is developed to jointly model sequential data and annotators' expertise, in which categorical distribution is introduced to estimate the reliability of each annotator in capturing local and nonlocal label dependencies for sequential annotation. To accelerate the marginalization of the proposed model, a valid label sequence inference (VLSE) method is proposed to derive the valid ground-truth label sequences from crowd sequential annotations. VLSE derives possible ground-truth labels from the tokenwise level and further prunes subpaths in the forward inference for label sequence decoding. VLSE reduces the number of candidate label sequences and improves the quality of possible ground-truth label sequences. The experimental results on several sequence labeling tasks of Natural Language Processing show the effectiveness of the proposed model.

摘要

众包序列标注可以成为构建大规模序列标注数据集的高效且具有成本效益的方法。与独立标注实例不同,对于众包序列标注,标签序列的质量依赖于标注者在捕捉序列中每个标记的内部依赖关系方面的专业水平。在本文中,我们提出了一种基于众包的序列标注序列标注建模方法(SA-SLC)。首先,我们开发了一个条件概率模型来联合建模序列数据和标注者的专业知识,其中引入了类别分布来估计每个标注者在捕捉序列标注的局部和非局部标签依赖关系方面的可靠性。为了加速所提出模型的边缘化,我们提出了一种有效的标签序列推断(VLSE)方法,从众包序列标注中推导出有效的真实标签序列。VLSE 从标记级别的角度推导出可能的真实标签,并进一步在正向推断中剪枝标签序列解码的子路径。VLSE 减少了候选标签序列的数量,并提高了可能的真实标签序列的质量。在自然语言处理的几个序列标注任务上的实验结果表明了所提出模型的有效性。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验