• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

斯洛伐克屈折语各种标注方法的比较。

Comparison of various approaches to tagging for the inflectional Slovak language.

作者信息

Benko Lubomír, Munkova Dasa, Pappová Mária, Munk Michal

机构信息

Department of Computer Science, Constantine the Philosopher University in Nitra, Nitra, Slovakia.

Science and Research Centre, University of Pardubice, Pardubice, Czech Republic.

出版信息

PeerJ Comput Sci. 2024 May 24;10:e2026. doi: 10.7717/peerj-cs.2026. eCollection 2024.

DOI:10.7717/peerj-cs.2026
PMID:38855261
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11157559/
Abstract

Morphological tagging provides essential insights into grammar, structure, and the mutual relationships of words within the sentence. Tagging text in a highly inflectional language presents a challenging task due to word ambiguity. This research aims to compare six different automatic taggers for the inflectional Slovak language, seeking for the most accurate tagger for literary and non-literary texts. Our results indicate that it is useful to differentiate texts into literary and non-literary and subsequently, based on the text style to deploy a tagger. For literary texts, UDPipe2 outperformed others in seven out of nine examined tagset positions. Conversely, for non-literary texts, the RNNTagger exhibited the highest performance in eight out of nine examined tagset positions. The RNNTagger is recommended for both types of the text, the best captures the inflection of the Slovak language, but UDPipe2 demonstrates a higher accuracy for literary texts. Despite dataset size limitations, this study emphasizes the suitability of various taggers for the inflectional languages like Slovak.

摘要

形态标注为语法、结构以及句子中单词之间的相互关系提供了重要见解。在高度屈折的语言中进行文本标注,由于单词的歧义性,是一项具有挑战性的任务。本研究旨在比较六种针对屈折语斯洛伐克语的不同自动标注器,寻找最适合文学文本和非文学文本的标注器。我们的结果表明,将文本区分为文学文本和非文学文本,然后根据文本风格部署标注器是有用的。对于文学文本,在九个考察的标记集位置中的七个位置上,UDPipe2的表现优于其他标注器。相反,对于非文学文本,RNNTagger在九个考察的标记集位置中的八个位置上表现出最高性能。RNNTagger被推荐用于这两种类型的文本,它能最好地捕捉斯洛伐克语的屈折变化,但UDPipe2在文学文本上表现出更高的准确性。尽管数据集规模有限,但本研究强调了各种标注器对于像斯洛伐克语这样的屈折语的适用性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a6d1/11157559/197161594ba1/peerj-cs-10-2026-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a6d1/11157559/cfe20b0a1b3a/peerj-cs-10-2026-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a6d1/11157559/af2c6c062c7f/peerj-cs-10-2026-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a6d1/11157559/197161594ba1/peerj-cs-10-2026-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a6d1/11157559/cfe20b0a1b3a/peerj-cs-10-2026-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a6d1/11157559/af2c6c062c7f/peerj-cs-10-2026-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a6d1/11157559/197161594ba1/peerj-cs-10-2026-g003.jpg

相似文献

1
Comparison of various approaches to tagging for the inflectional Slovak language.斯洛伐克屈折语各种标注方法的比较。
PeerJ Comput Sci. 2024 May 24;10:e2026. doi: 10.7717/peerj-cs.2026. eCollection 2024.
2
A token centric part-of-speech tagger for biomedical text.一种用于生物医学文本的以词元为中心的词性标注器。
Artif Intell Med. 2014 May;61(1):11-20. doi: 10.1016/j.artmed.2014.03.005. Epub 2014 Mar 26.
3
A universal multilingual weightless neural network tagger via quantitative linguistics.一种基于定量语言学的通用多语言无权重神经网络标记器。
Neural Netw. 2017 Jul;91:85-101. doi: 10.1016/j.neunet.2017.04.011. Epub 2017 Apr 26.
4
Multilingual part-of-speech tagging with weightless neural networks.使用无权重神经网络进行多语言词性标注。
Neural Netw. 2015 Jun;66:11-21. doi: 10.1016/j.neunet.2015.02.012. Epub 2015 Mar 2.
5
Part-of-speech tagging for clinical text: wall or bridge between institutions?临床文本的词性标注:机构之间的壁垒还是桥梁?
AMIA Annu Symp Proc. 2011;2011:382-91. Epub 2011 Oct 22.
6
Performance analysis of a POS tagger applied to discharge summaries in Portuguese.应用于葡萄牙语出院小结的词性标注器性能分析。
Stud Health Technol Inform. 2010;160(Pt 2):959-63.
7
Evaluating automatic sentence alignment approaches on English-Slovak sentences.评估英语-斯洛伐克语句子的自动句子对齐方法。
Sci Rep. 2023 Nov 17;13(1):20123. doi: 10.1038/s41598-023-47479-w.
8
Exploring Spanish writing abilities of children with developmental language disorder in expository texts.探索发育性语言障碍儿童在说明文文本中的西班牙语写作能力。
Front Psychol. 2024 Apr 11;15:1360245. doi: 10.3389/fpsyg.2024.1360245. eCollection 2024.
9
Improving part-of-speech tagging in Amharic language using deep neural network.使用深度神经网络改进阿姆哈拉语的词性标注
Heliyon. 2023 Jun 21;9(7):e17175. doi: 10.1016/j.heliyon.2023.e17175. eCollection 2023 Jul.
10
Improving performance of natural language processing part-of-speech tagging on clinical narratives through domain adaptation.通过领域自适应提高临床叙述自然语言处理词性标注的性能。
J Am Med Inform Assoc. 2013 Sep-Oct;20(5):931-9. doi: 10.1136/amiajnl-2012-001453. Epub 2013 Mar 13.

本文引用的文献

1
The neural machine translation models for the low-resource Kazakh-English language pair.针对低资源哈萨克语-英语语言对的神经机器翻译模型。
PeerJ Comput Sci. 2023 Feb 8;9:e1224. doi: 10.7717/peerj-cs.1224. eCollection 2023.
2
The role of automated evaluation techniques in online professional translator training.自动化评估技术在在线专业翻译人员培训中的作用。
PeerJ Comput Sci. 2021 Oct 4;7:e706. doi: 10.7717/peerj-cs.706. eCollection 2021.
3
Long short-term memory.长短期记忆
Neural Comput. 1997 Nov 15;9(8):1735-80. doi: 10.1162/neco.1997.9.8.1735.