Suppr超能文献

临床文本的句法分析:处理不规范句子的指南和语料库开发。

Syntactic parsing of clinical text: guideline and corpus development with handling ill-formed sentences.

机构信息

Medical Informatics, Kaiser Permanente Southern California, Pasadena, California, USA.

出版信息

J Am Med Inform Assoc. 2013 Nov-Dec;20(6):1168-77. doi: 10.1136/amiajnl-2013-001810. Epub 2013 Aug 1.

Abstract

OBJECTIVE

To develop, evaluate, and share: (1) syntactic parsing guidelines for clinical text, with a new approach to handling ill-formed sentences; and (2) a clinical Treebank annotated according to the guidelines. To document the process and findings for readers with similar interest.

METHODS

Using random samples from a shared natural language processing challenge dataset, we developed a handbook of domain-customized syntactic parsing guidelines based on iterative annotation and adjudication between two institutions. Special considerations were incorporated into the guidelines for handling ill-formed sentences, which are common in clinical text. Intra- and inter-annotator agreement rates were used to evaluate consistency in following the guidelines. Quantitative and qualitative properties of the annotated Treebank, as well as its use to retrain a statistical parser, were reported.

RESULTS

A supplement to the Penn Treebank II guidelines was developed for annotating clinical sentences. After three iterations of annotation and adjudication on 450 sentences, the annotators reached an F-measure agreement rate of 0.930 (while intra-annotator rate was 0.948) on a final independent set. A total of 1100 sentences from progress notes were annotated that demonstrated domain-specific linguistic features. A statistical parser retrained with combined general English (mainly news text) annotations and our annotations achieved an accuracy of 0.811 (higher than models trained purely with either general or clinical sentences alone). Both the guidelines and syntactic annotations are made available at https://sourceforge.net/projects/medicaltreebank.

CONCLUSIONS

We developed guidelines for parsing clinical text and annotated a corpus accordingly. The high intra- and inter-annotator agreement rates showed decent consistency in following the guidelines. The corpus was shown to be useful in retraining a statistical parser that achieved moderate accuracy.

摘要

目的

制定、评估和分享:(1)临床文本的句法分析指南,采用处理不规则句子的新方法;(2)根据指南标注的临床 Treebank。为有类似兴趣的读者记录这一过程和发现。

方法

使用来自共享自然语言处理挑战数据集的随机样本,我们根据两个机构之间的迭代注释和裁决,制定了一本领域定制的句法分析指南手册。在指南中纳入了处理不规则句子的特殊考虑因素,不规则句子在临床文本中很常见。使用内部和外部注释者的一致性率来评估遵循指南的一致性。报告了标注 Treebank 的定量和定性属性,以及其用于重新训练统计解析器的用途。

结果

为标注临床句子开发了 Penn Treebank II 指南的补充内容。在对 450 个句子进行三轮注释和裁决后,注释者在最终的独立集上达到了 0.930 的 F 度量一致性率(而内部注释者的比率为 0.948)。总共对来自进度记录的 1100 个句子进行了标注,展示了特定于领域的语言特征。用结合通用英语(主要是新闻文本)注释和我们的注释重新训练的统计解析器的准确性达到 0.811(高于仅使用通用或临床句子单独训练的模型)。指南和句法注释都可在 https://sourceforge.net/projects/medicaltreebank 上获得。

结论

我们制定了用于解析临床文本的指南,并相应地标注了语料库。高的内部和外部注释者一致性率表明遵循指南的一致性较好。该语料库被证明可用于重新训练统计解析器,达到中等准确性。

相似文献

2
Parsing clinical text: how good are the state-of-the-art parsers?解析临床文本:最先进的解析器有多出色?
BMC Med Inform Decis Mak. 2015;15 Suppl 1(Suppl 1):S2. doi: 10.1186/1472-6947-15-S1-S2. Epub 2015 May 20.
3
Towards comprehensive syntactic and semantic annotations of the clinical narrative.朝着临床叙述的全面句法和语义标注努力。
J Am Med Inform Assoc. 2013 Sep-Oct;20(5):922-30. doi: 10.1136/amiajnl-2012-001317. Epub 2013 Jan 25.
4
Domain adaption of parsing for operative notes.手术记录解析的领域适应
J Biomed Inform. 2015 Apr;54:1-9. doi: 10.1016/j.jbi.2015.01.016. Epub 2015 Feb 7.
8
Anaphoric relations in the clinical narrative: corpus creation.临床叙述中的回指关系:语料库创建。
J Am Med Inform Assoc. 2011 Jul-Aug;18(4):459-65. doi: 10.1136/amiajnl-2011-000108. Epub 2011 Apr 1.
10
Assisted annotation of medical free text using RapTAT.使用 RapTAT 辅助医学自由文本的注释。
J Am Med Inform Assoc. 2014 Sep-Oct;21(5):833-41. doi: 10.1136/amiajnl-2013-002255. Epub 2014 Jan 15.

引用本文的文献

1
Radiology Text Analysis System (RadText): Architecture and Evaluation.放射学文本分析系统(RadText):架构与评估
Proc (IEEE Int Conf Healthc Inform). 2022 Jun;2022:288-296. doi: 10.1109/ichi54592.2022.00050. Epub 2022 Sep 8.
10
Creation of a new longitudinal corpus of clinical narratives.创建一个新的临床叙事纵向语料库。
J Biomed Inform. 2015 Dec;58 Suppl(Suppl):S6-S10. doi: 10.1016/j.jbi.2015.09.018. Epub 2015 Oct 1.

本文引用的文献

1
Towards comprehensive syntactic and semantic annotations of the clinical narrative.朝着临床叙述的全面句法和语义标注努力。
J Am Med Inform Assoc. 2013 Sep-Oct;20(5):922-30. doi: 10.1136/amiajnl-2012-001317. Epub 2013 Jan 25.
3
A system for coreference resolution for the clinical narrative.临床叙述的共指消解系统。
J Am Med Inform Assoc. 2012 Jul-Aug;19(4):660-7. doi: 10.1136/amiajnl-2011-000599. Epub 2012 Jan 31.
5
Natural language processing: an introduction.自然语言处理:入门。
J Am Med Inform Assoc. 2011 Sep-Oct;18(5):544-51. doi: 10.1136/amiajnl-2011-000464.
7
2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text.2010 i2b2/VA 挑战赛:临床文本中的概念、断言和关系
J Am Med Inform Assoc. 2011 Sep-Oct;18(5):552-6. doi: 10.1136/amiajnl-2011-000203. Epub 2011 Jun 16.
9
Extracting medication information from clinical text.从临床文本中提取药物信息。
J Am Med Inform Assoc. 2010 Sep-Oct;17(5):514-8. doi: 10.1136/jamia.2010.003947.
10
Recognizing obesity and comorbidities in sparse data.在稀疏数据中识别肥胖及合并症。
J Am Med Inform Assoc. 2009 Jul-Aug;16(4):561-70. doi: 10.1197/jamia.M3115. Epub 2009 Apr 23.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验