Suppr超能文献

dTagger:一种词性标注器。

dTagger: a POS tagger.

作者信息

Divita Guy, Browne Allen C, Loane Russell

机构信息

National Library of Medicine, Bethesda, Maryland, USA.

出版信息

AMIA Annu Symp Proc. 2006;2006:200-3.

Abstract

The Lexical Systems Group at the National Library of Medicine (NLM) has developed a Part-of-Speech (POS) tagger to be freely distributed with the SPECIALIST NLP Tools. dTagger is specifically designed for use with the SPECIALIST lexicon but it can be used with an arbitrary tag set. It is capable of single or multi-word chunking. It is trainable with previously annotated text and in development is a version that is tunable with untagged text. The tagger allows users to add local lexicon content. It can report likelihoods for each sentence tagged. New words seen while tagging (the unknowns) are handled by shape identification including heuristics based on suffix statistics gleaned during the training. The performance of the supervised training is noted to be 95% on a modified version of the MedPost hand annotated Medline abstracts. Eight percent of the terms within this corpus were multi-word entities.

摘要

美国国立医学图书馆(NLM)的词汇系统小组开发了一种词性(POS)标注器,将与专业自然语言处理工具一起免费分发。dTagger是专门为与专业词典配合使用而设计的,但它也可以与任意标签集一起使用。它能够进行单字或多字组块。它可以用先前标注的文本进行训练,并且正在开发一个可以用未标注文本进行调整的版本。该标注器允许用户添加本地词典内容。它可以报告每个标注句子的可能性。在标注过程中遇到的新词(未知词)通过形状识别来处理,包括基于训练期间收集的后缀统计信息的启发式方法。在MedPost人工标注的Medline摘要的修改版本上,监督训练的性能被记录为95%。该语料库中8%的术语是多字实体。

相似文献

1
2
MedPost: a part-of-speech tagger for bioMedical text.MedPost:一种用于生物医学文本的词性标注器。
Bioinformatics. 2004 Sep 22;20(14):2320-1. doi: 10.1093/bioinformatics/bth227. Epub 2004 Apr 8.
4
Developing a corpus of clinical notes manually annotated for part-of-speech.开发一个词性人工标注的临床笔记语料库。
Int J Med Inform. 2006 Jun;75(6):418-29. doi: 10.1016/j.ijmedinf.2005.08.006. Epub 2005 Sep 19.
5
A token centric part-of-speech tagger for biomedical text.一种用于生物医学文本的以词元为中心的词性标注器。
Artif Intell Med. 2014 May;61(1):11-20. doi: 10.1016/j.artmed.2014.03.005. Epub 2014 Mar 26.
6
Domain adaption of parsing for operative notes.手术记录解析的领域适应
J Biomed Inform. 2015 Apr;54:1-9. doi: 10.1016/j.jbi.2015.01.016. Epub 2015 Feb 7.
8
Comparing and combining chunkers of biomedical text.比较和组合生物医学文本的分词器。
J Biomed Inform. 2011 Apr;44(2):354-60. doi: 10.1016/j.jbi.2010.10.005. Epub 2010 Nov 4.
9
Improved part-of-speech prediction in suffix analysis.后缀分析中的改进词性预测。
PLoS One. 2013 Oct 4;8(10):e76042. doi: 10.1371/journal.pone.0076042. eCollection 2013.
10
Automatic term list generation for entity tagging.用于实体标记的自动术语列表生成。
Bioinformatics. 2006 Mar 15;22(6):651-7. doi: 10.1093/bioinformatics/bti733. Epub 2005 Oct 25.

本文引用的文献

1
MedPost: a part-of-speech tagger for bioMedical text.MedPost:一种用于生物医学文本的词性标注器。
Bioinformatics. 2004 Sep 22;20(14):2320-1. doi: 10.1093/bioinformatics/bth227. Epub 2004 Apr 8.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验