dTagger：一种词性标注器。

dTagger: a POS tagger.

作者信息

Divita Guy, Browne Allen C, Loane Russell

机构信息

National Library of Medicine, Bethesda, Maryland, USA.

出版信息

AMIA Annu Symp Proc. 2006;2006:200-3.

PMID:17238331

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1839340/

Abstract

The Lexical Systems Group at the National Library of Medicine (NLM) has developed a Part-of-Speech (POS) tagger to be freely distributed with the SPECIALIST NLP Tools. dTagger is specifically designed for use with the SPECIALIST lexicon but it can be used with an arbitrary tag set. It is capable of single or multi-word chunking. It is trainable with previously annotated text and in development is a version that is tunable with untagged text. The tagger allows users to add local lexicon content. It can report likelihoods for each sentence tagged. New words seen while tagging (the unknowns) are handled by shape identification including heuristics based on suffix statistics gleaned during the training. The performance of the supervised training is noted to be 95% on a modified version of the MedPost hand annotated Medline abstracts. Eight percent of the terms within this corpus were multi-word entities.

摘要

美国国立医学图书馆（NLM）的词汇系统小组开发了一种词性（POS）标注器，将与专业自然语言处理工具一起免费分发。dTagger是专门为与专业词典配合使用而设计的，但它也可以与任意标签集一起使用。它能够进行单字或多字组块。它可以用先前标注的文本进行训练，并且正在开发一个可以用未标注文本进行调整的版本。该标注器允许用户添加本地词典内容。它可以报告每个标注句子的可能性。在标注过程中遇到的新词（未知词）通过形状识别来处理，包括基于训练期间收集的后缀统计信息的启发式方法。在MedPost人工标注的Medline摘要的修改版本上，监督训练的性能被记录为95%。该语料库中8%的术语是多字实体。

相似文献

dTagger: a POS tagger.dTagger：一种词性标注器。

AMIA Annu Symp Proc. 2006;2006:200-3.

MedPost: a part-of-speech tagger for bioMedical text.MedPost：一种用于生物医学文本的词性标注器。

Bioinformatics. 2004 Sep 22;20(14):2320-1. doi: 10.1093/bioinformatics/bth227. Epub 2004 Apr 8.

Heuristic sample selection to minimize reference standard training set for a part-of-speech tagger.用于最小化词性标注器参考标准训练集的启发式样本选择。

J Am Med Inform Assoc. 2007 Sep-Oct;14(5):641-50. doi: 10.1197/jamia.M2392. Epub 2007 Jun 28.

Developing a corpus of clinical notes manually annotated for part-of-speech.开发一个词性人工标注的临床笔记语料库。

Int J Med Inform. 2006 Jun;75(6):418-29. doi: 10.1016/j.ijmedinf.2005.08.006. Epub 2005 Sep 19.

A token centric part-of-speech tagger for biomedical text.一种用于生物医学文本的以词元为中心的词性标注器。

Artif Intell Med. 2014 May;61(1):11-20. doi: 10.1016/j.artmed.2014.03.005. Epub 2014 Mar 26.

Domain adaption of parsing for operative notes.手术记录解析的领域适应

J Biomed Inform. 2015 Apr;54:1-9. doi: 10.1016/j.jbi.2015.01.016. Epub 2015 Feb 7.

Ranking the whole MEDLINE database according to a large training set using text indexing.使用文本索引根据一个大型训练集对整个MEDLINE数据库进行排名。

BMC Bioinformatics. 2005 Mar 24;6:75. doi: 10.1186/1471-2105-6-75.

Comparing and combining chunkers of biomedical text.比较和组合生物医学文本的分词器。

J Biomed Inform. 2011 Apr;44(2):354-60. doi: 10.1016/j.jbi.2010.10.005. Epub 2010 Nov 4.

Improved part-of-speech prediction in suffix analysis.后缀分析中的改进词性预测。

PLoS One. 2013 Oct 4;8(10):e76042. doi: 10.1371/journal.pone.0076042. eCollection 2013.

Automatic term list generation for entity tagging.用于实体标记的自动术语列表生成。

Bioinformatics. 2006 Mar 15;22(6):651-7. doi: 10.1093/bioinformatics/bti733. Epub 2005 Oct 25.

引用本文的文献

De-identification of Address, Date, and Alphanumeric Identifiers in Narrative Clinical Reports.病历叙述报告中地址、日期及字母数字标识符的去识别化处理

AMIA Annu Symp Proc. 2014 Nov 14;2014:767-76. eCollection 2014.

Part-of-speech tagging for clinical text: wall or bridge between institutions?临床文本的词性标注：机构之间的壁垒还是桥梁？

AMIA Annu Symp Proc. 2011;2011:382-91. Epub 2011 Oct 22.

Linking genes to literature: text mining, information extraction, and retrieval applications for biology.将基因与文献相联系：生物学的文本挖掘、信息提取及检索应用

Genome Biol. 2008;9 Suppl 2(Suppl 2):S8. doi: 10.1186/gb-2008-9-s2-s8. Epub 2008 Sep 1.

Heuristic sample selection to minimize reference standard training set for a part-of-speech tagger.用于最小化词性标注器参考标准训练集的启发式样本选择。

J Am Med Inform Assoc. 2007 Sep-Oct;14(5):641-50. doi: 10.1197/jamia.M2392. Epub 2007 Jun 28.

本文引用的文献

MedPost: a part-of-speech tagger for bioMedical text.MedPost：一种用于生物医学文本的词性标注器。

Bioinformatics. 2004 Sep 22;20(14):2320-1. doi: 10.1093/bioinformatics/bth227. Epub 2004 Apr 8.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验