• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

巴斯克语中的共指关系:标注过程。

Coreferential Relations in Basque: The Annotation Process.

作者信息

Ceberio Klara, Aduriz Itziar, Díaz de Ilarraza Arantza, Garcia-Azkoaga Ines

机构信息

IXA Group, Faculty of Informatics, UPV-EHU, Donostia, Spain.

IXA Group, Department of Catalan Philology and General Linguistics, Universitat de Barcelona, Barcelona, Spain.

出版信息

J Psycholinguist Res. 2018 Apr;47(2):325-342. doi: 10.1007/s10936-018-9559-6.

DOI:10.1007/s10936-018-9559-6
PMID:29399705
Abstract

In this paper we present the coreferential tagging of part of the EPEC Corpus of Basque. Although coreference is a pragmatic linguistic phenomenon highly dependent on the situational context, it shows some language-specific patterns that vary according to the features of each language. Due to the fact that Basque is not an Indo-European language, it differs considerably in grammar from the languages spoken in surrounding areas. We will explain these features and the decisions made in each case. After describing the criteria defined for coreferential tagging in Basque, the annotation process will be explained. Our annotation is based on a morphologically and syntactically annotated corpus that provides us with a manageable environment, in which the specific structures that are part of a reference chain can be more easily identified. A part of the corpus was tagged by two annotators who marked up the same text independently, and by another annotator that acted as judge, solving problems in case of disagreement. All this process has been automatized as a result of previous studies carried out in this field. The automatic detection of mentions (Soraluze et al., in: Proceedings of Konvens, 2012) has provided us with a better working environment, and given us the possibility to build a first significant corpus for a later computational treatment of automatic coreferential resolution.

摘要

在本文中,我们展示了巴斯克语EPEC语料库部分内容的共指标注。虽然共指是一种高度依赖情境上下文的语用语言现象,但它呈现出一些因语言而异的特定语言模式。由于巴斯克语不是印欧语系语言,它在语法上与周边地区使用的语言有很大差异。我们将解释这些特征以及在每种情况下所做的决定。在描述了为巴斯克语共指标注定义的标准之后,将解释标注过程。我们的标注基于一个经过形态和句法标注的语料库,该语料库为我们提供了一个易于管理的环境,在这个环境中,可以更容易地识别作为指代链一部分的特定结构。语料库的一部分由两名注释者独立标记相同的文本,另一名注释者作为裁判,在出现分歧时解决问题。由于此前在该领域进行的研究,所有这些过程都已自动化。提及的自动检测(索拉卢泽等人,见:《Konvens会议论文集》,2012年)为我们提供了一个更好的工作环境,并使我们有可能构建一个首个重要语料库,以便日后对自动共指消解进行计算处理。

相似文献

1
Coreferential Relations in Basque: The Annotation Process.巴斯克语中的共指关系:标注过程。
J Psycholinguist Res. 2018 Apr;47(2):325-342. doi: 10.1007/s10936-018-9559-6.
2
EUSKOR: End-to-end coreference resolution system for Basque.EUSKOR:巴斯克语端到端共指消解系统。
PLoS One. 2019 Sep 12;14(9):e0221801. doi: 10.1371/journal.pone.0221801. eCollection 2019.
3
Bio-SCoRes: A Smorgasbord Architecture for Coreference Resolution in Biomedical Text.生物共指消解评分系统(Bio-SCoRes):一种用于生物医学文本共指消解的混合架构
PLoS One. 2016 Mar 2;11(3):e0148538. doi: 10.1371/journal.pone.0148538. eCollection 2016.
4
A multilingual gold-standard corpus for biomedical concept recognition: the Mantra GSC.用于生物医学概念识别的多语言金标准语料库:Mantra GSC。
J Am Med Inform Assoc. 2015 Sep;22(5):948-56. doi: 10.1093/jamia/ocv037. Epub 2015 May 6.
5
Coreference annotation and resolution in the Colorado Richly Annotated Full Text (CRAFT) corpus of biomedical journal articles.科罗拉多生物医学期刊文章丰富注释全文(CRAFT)语料库中的共指标注与消解
BMC Bioinformatics. 2017 Aug 17;18(1):372. doi: 10.1186/s12859-017-1775-9.
6
Syntactic parsing of clinical text: guideline and corpus development with handling ill-formed sentences.临床文本的句法分析:处理不规范句子的指南和语料库开发。
J Am Med Inform Assoc. 2013 Nov-Dec;20(6):1168-77. doi: 10.1136/amiajnl-2013-001810. Epub 2013 Aug 1.
7
Task definition, annotated dataset, and supervised natural language processing models for symptom extraction from unstructured clinical notes.从非结构化临床记录中提取症状的任务定义、标注数据集和监督自然语言处理模型。
J Biomed Inform. 2020 Feb;102:103354. doi: 10.1016/j.jbi.2019.103354. Epub 2019 Dec 12.
8
Anaphoric relations in the clinical narrative: corpus creation.临床叙述中的回指关系:语料库创建。
J Am Med Inform Assoc. 2011 Jul-Aug;18(4):459-65. doi: 10.1136/amiajnl-2011-000108. Epub 2011 Apr 1.
9
Building a comprehensive syntactic and semantic corpus of Chinese clinical texts.构建中文临床文本的综合句法和语义语料库。
J Biomed Inform. 2017 May;69:203-217. doi: 10.1016/j.jbi.2017.04.006. Epub 2017 Apr 9.
10
Minimalistic Approach to Coreference Resolution in Lithuanian Medical Records.立陶宛语病历中指代消解的极简方法。
Comput Math Methods Med. 2019 Mar 20;2019:9079840. doi: 10.1155/2019/9079840. eCollection 2019.

引用本文的文献

1
An extensive review of tools for manual annotation of documents.对文档手动标注工具的全面回顾。
Brief Bioinform. 2021 Jan 18;22(1):146-163. doi: 10.1093/bib/bbz130.
2
EUSKOR: End-to-end coreference resolution system for Basque.EUSKOR:巴斯克语端到端共指消解系统。
PLoS One. 2019 Sep 12;14(9):e0221801. doi: 10.1371/journal.pone.0221801. eCollection 2019.