• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

临床文本的句法分析:处理不规范句子的指南和语料库开发。

Syntactic parsing of clinical text: guideline and corpus development with handling ill-formed sentences.

机构信息

Medical Informatics, Kaiser Permanente Southern California, Pasadena, California, USA.

出版信息

J Am Med Inform Assoc. 2013 Nov-Dec;20(6):1168-77. doi: 10.1136/amiajnl-2013-001810. Epub 2013 Aug 1.

DOI:10.1136/amiajnl-2013-001810
PMID:23907286
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3822122/
Abstract

OBJECTIVE

To develop, evaluate, and share: (1) syntactic parsing guidelines for clinical text, with a new approach to handling ill-formed sentences; and (2) a clinical Treebank annotated according to the guidelines. To document the process and findings for readers with similar interest.

METHODS

Using random samples from a shared natural language processing challenge dataset, we developed a handbook of domain-customized syntactic parsing guidelines based on iterative annotation and adjudication between two institutions. Special considerations were incorporated into the guidelines for handling ill-formed sentences, which are common in clinical text. Intra- and inter-annotator agreement rates were used to evaluate consistency in following the guidelines. Quantitative and qualitative properties of the annotated Treebank, as well as its use to retrain a statistical parser, were reported.

RESULTS

A supplement to the Penn Treebank II guidelines was developed for annotating clinical sentences. After three iterations of annotation and adjudication on 450 sentences, the annotators reached an F-measure agreement rate of 0.930 (while intra-annotator rate was 0.948) on a final independent set. A total of 1100 sentences from progress notes were annotated that demonstrated domain-specific linguistic features. A statistical parser retrained with combined general English (mainly news text) annotations and our annotations achieved an accuracy of 0.811 (higher than models trained purely with either general or clinical sentences alone). Both the guidelines and syntactic annotations are made available at https://sourceforge.net/projects/medicaltreebank.

CONCLUSIONS

We developed guidelines for parsing clinical text and annotated a corpus accordingly. The high intra- and inter-annotator agreement rates showed decent consistency in following the guidelines. The corpus was shown to be useful in retraining a statistical parser that achieved moderate accuracy.

摘要

目的

制定、评估和分享:(1)临床文本的句法分析指南,采用处理不规则句子的新方法;(2)根据指南标注的临床 Treebank。为有类似兴趣的读者记录这一过程和发现。

方法

使用来自共享自然语言处理挑战数据集的随机样本,我们根据两个机构之间的迭代注释和裁决,制定了一本领域定制的句法分析指南手册。在指南中纳入了处理不规则句子的特殊考虑因素,不规则句子在临床文本中很常见。使用内部和外部注释者的一致性率来评估遵循指南的一致性。报告了标注 Treebank 的定量和定性属性,以及其用于重新训练统计解析器的用途。

结果

为标注临床句子开发了 Penn Treebank II 指南的补充内容。在对 450 个句子进行三轮注释和裁决后,注释者在最终的独立集上达到了 0.930 的 F 度量一致性率(而内部注释者的比率为 0.948)。总共对来自进度记录的 1100 个句子进行了标注,展示了特定于领域的语言特征。用结合通用英语(主要是新闻文本)注释和我们的注释重新训练的统计解析器的准确性达到 0.811(高于仅使用通用或临床句子单独训练的模型)。指南和句法注释都可在 https://sourceforge.net/projects/medicaltreebank 上获得。

结论

我们制定了用于解析临床文本的指南,并相应地标注了语料库。高的内部和外部注释者一致性率表明遵循指南的一致性较好。该语料库被证明可用于重新训练统计解析器,达到中等准确性。

相似文献

1
Syntactic parsing of clinical text: guideline and corpus development with handling ill-formed sentences.临床文本的句法分析:处理不规范句子的指南和语料库开发。
J Am Med Inform Assoc. 2013 Nov-Dec;20(6):1168-77. doi: 10.1136/amiajnl-2013-001810. Epub 2013 Aug 1.
2
Parsing clinical text: how good are the state-of-the-art parsers?解析临床文本:最先进的解析器有多出色?
BMC Med Inform Decis Mak. 2015;15 Suppl 1(Suppl 1):S2. doi: 10.1186/1472-6947-15-S1-S2. Epub 2015 May 20.
3
Towards comprehensive syntactic and semantic annotations of the clinical narrative.朝着临床叙述的全面句法和语义标注努力。
J Am Med Inform Assoc. 2013 Sep-Oct;20(5):922-30. doi: 10.1136/amiajnl-2012-001317. Epub 2013 Jan 25.
4
Domain adaption of parsing for operative notes.手术记录解析的领域适应
J Biomed Inform. 2015 Apr;54:1-9. doi: 10.1016/j.jbi.2015.01.016. Epub 2015 Feb 7.
5
Semantic Role Labeling of Clinical Text: Comparing Syntactic Parsers and Features.临床文本的语义角色标注:句法分析器与特征比较
AMIA Annu Symp Proc. 2017 Feb 10;2016:1283-1292. eCollection 2016.
6
Building a comprehensive syntactic and semantic corpus of Chinese clinical texts.构建中文临床文本的综合句法和语义语料库。
J Biomed Inform. 2017 May;69:203-217. doi: 10.1016/j.jbi.2017.04.006. Epub 2017 Apr 9.
7
New directions in biomedical text annotation: definitions, guidelines and corpus construction.生物医学文本注释的新方向:定义、指南与语料库构建
BMC Bioinformatics. 2006 Jul 25;7:356. doi: 10.1186/1471-2105-7-356.
8
Anaphoric relations in the clinical narrative: corpus creation.临床叙述中的回指关系:语料库创建。
J Am Med Inform Assoc. 2011 Jul-Aug;18(4):459-65. doi: 10.1136/amiajnl-2011-000108. Epub 2011 Apr 1.
9
PhenoDEF: a corpus for annotating sentences with information of phenotype definitions in biomedical literature.PhenoDEF:一个用于在生物医学文献中注释具有表型定义信息的句子的语料库。
J Biomed Semantics. 2022 Jun 11;13(1):17. doi: 10.1186/s13326-022-00272-6.
10
Assisted annotation of medical free text using RapTAT.使用 RapTAT 辅助医学自由文本的注释。
J Am Med Inform Assoc. 2014 Sep-Oct;21(5):833-41. doi: 10.1136/amiajnl-2013-002255. Epub 2014 Jan 15.

引用本文的文献

1
Radiology Text Analysis System (RadText): Architecture and Evaluation.放射学文本分析系统(RadText):架构与评估
Proc (IEEE Int Conf Healthc Inform). 2022 Jun;2022:288-296. doi: 10.1109/ichi54592.2022.00050. Epub 2022 Sep 8.
2
An Unsupervised Approach to Structuring and Analyzing Repetitive Semantic Structures in Free Text of Electronic Medical Records.一种用于构建和分析电子病历自由文本中重复语义结构的无监督方法。
J Pers Med. 2022 Jan 1;12(1):25. doi: 10.3390/jpm12010025.
3
Constructing fine-grained entity recognition corpora based on clinical records of traditional Chinese medicine.基于中医临床记录构建细粒度实体识别语料库。
BMC Med Inform Decis Mak. 2020 Apr 6;20(1):64. doi: 10.1186/s12911-020-1079-2.
4
Parsing clinical text using the state-of-the-art deep learning based parsers: a systematic comparison.基于最先进的深度学习解析器解析临床文本:系统比较。
BMC Med Inform Decis Mak. 2019 Apr 4;19(Suppl 3):77. doi: 10.1186/s12911-019-0783-2.
5
A Preliminary Study of Clinical Concept Detection Using Syntactic Relations.利用句法关系进行临床概念检测的初步研究
AMIA Annu Symp Proc. 2018 Dec 5;2018:1028-1035. eCollection 2018.
6
CLAMP - a toolkit for efficiently building customized clinical natural language processing pipelines.CLAMP - 一个用于高效构建定制化临床自然语言处理管道的工具包。
J Am Med Inform Assoc. 2018 Mar 1;25(3):331-336. doi: 10.1093/jamia/ocx132.
7
Semantic Role Labeling of Clinical Text: Comparing Syntactic Parsers and Features.临床文本的语义角色标注:句法分析器与特征比较
AMIA Annu Symp Proc. 2017 Feb 10;2016:1283-1292. eCollection 2016.
8
Annotating patient clinical records with syntactic chunks and named entities: the Harvey Corpus.用句法块和命名实体标注患者临床记录:哈维语料库。
Lang Resour Eval. 2016;50:523-548. doi: 10.1007/s10579-015-9330-7. Epub 2016 Jan 11.
9
Practical applications for natural language processing in clinical research: The 2014 i2b2/UTHealth shared tasks.自然语言处理在临床研究中的实际应用:2014年i2b2/德克萨斯大学健康科学中心共享任务
J Biomed Inform. 2015 Dec;58 Suppl(Suppl):S1-S5. doi: 10.1016/j.jbi.2015.10.007. Epub 2015 Oct 24.
10
Creation of a new longitudinal corpus of clinical narratives.创建一个新的临床叙事纵向语料库。
J Biomed Inform. 2015 Dec;58 Suppl(Suppl):S6-S10. doi: 10.1016/j.jbi.2015.09.018. Epub 2015 Oct 1.

本文引用的文献

1
Towards comprehensive syntactic and semantic annotations of the clinical narrative.朝着临床叙述的全面句法和语义标注努力。
J Am Med Inform Assoc. 2013 Sep-Oct;20(5):922-30. doi: 10.1136/amiajnl-2012-001317. Epub 2013 Jan 25.
2
Building a robust, scalable and standards-driven infrastructure for secondary use of EHR data: the SHARPn project.为电子健康记录数据的二次利用构建一个强大、可扩展且符合标准的基础架构:SHARPn 项目。
J Biomed Inform. 2012 Aug;45(4):763-71. doi: 10.1016/j.jbi.2012.01.009. Epub 2012 Feb 4.
3
A system for coreference resolution for the clinical narrative.临床叙述的共指消解系统。
J Am Med Inform Assoc. 2012 Jul-Aug;19(4):660-7. doi: 10.1136/amiajnl-2011-000599. Epub 2012 Jan 31.
4
Part-of-speech tagging for clinical text: wall or bridge between institutions?临床文本的词性标注:机构之间的壁垒还是桥梁?
AMIA Annu Symp Proc. 2011;2011:382-91. Epub 2011 Oct 22.
5
Natural language processing: an introduction.自然语言处理:入门。
J Am Med Inform Assoc. 2011 Sep-Oct;18(5):544-51. doi: 10.1136/amiajnl-2011-000464.
6
Overcoming barriers to NLP for clinical text: the role of shared tasks and the need for additional creative solutions.克服临床文本自然语言处理的障碍:共享任务的作用及对其他创造性解决方案的需求。
J Am Med Inform Assoc. 2011 Sep-Oct;18(5):540-3. doi: 10.1136/amiajnl-2011-000465.
7
2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text.2010 i2b2/VA 挑战赛:临床文本中的概念、断言和关系
J Am Med Inform Assoc. 2011 Sep-Oct;18(5):552-6. doi: 10.1136/amiajnl-2011-000203. Epub 2011 Jun 16.
8
Deriving a probabilistic syntacto-semantic grammar for biomedicine based on domain-specific terminologies.基于领域特定术语的生物医学概率句法语义语法推导。
J Biomed Inform. 2011 Oct;44(5):805-14. doi: 10.1016/j.jbi.2011.04.006. Epub 2011 Apr 28.
9
Extracting medication information from clinical text.从临床文本中提取药物信息。
J Am Med Inform Assoc. 2010 Sep-Oct;17(5):514-8. doi: 10.1136/jamia.2010.003947.
10
Recognizing obesity and comorbidities in sparse data.在稀疏数据中识别肥胖及合并症。
J Am Med Inform Assoc. 2009 Jul-Aug;16(4):561-70. doi: 10.1197/jamia.M3115. Epub 2009 Apr 23.