• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

朝着临床叙述的全面句法和语义标注努力。

Towards comprehensive syntactic and semantic annotations of the clinical narrative.

机构信息

Department of Linguistics, University of Colorado, Boulder, Colorado, USA.

出版信息

J Am Med Inform Assoc. 2013 Sep-Oct;20(5):922-30. doi: 10.1136/amiajnl-2012-001317. Epub 2013 Jan 25.

DOI:10.1136/amiajnl-2012-001317
PMID:23355458
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3756257/
Abstract

OBJECTIVE

To create annotated clinical narratives with layers of syntactic and semantic labels to facilitate advances in clinical natural language processing (NLP). To develop NLP algorithms and open source components.

METHODS

Manual annotation of a clinical narrative corpus of 127 606 tokens following the Treebank schema for syntactic information, PropBank schema for predicate-argument structures, and the Unified Medical Language System (UMLS) schema for semantic information. NLP components were developed.

RESULTS

The final corpus consists of 13 091 sentences containing 1772 distinct predicate lemmas. Of the 766 newly created PropBank frames, 74 are verbs. There are 28 539 named entity (NE) annotations spread over 15 UMLS semantic groups, one UMLS semantic type, and the Person semantic category. The most frequent annotations belong to the UMLS semantic groups of Procedures (15.71%), Disorders (14.74%), Concepts and Ideas (15.10%), Anatomy (12.80%), Chemicals and Drugs (7.49%), and the UMLS semantic type of Sign or Symptom (12.46%). Inter-annotator agreement results: Treebank (0.926), PropBank (0.891-0.931), NE (0.697-0.750). The part-of-speech tagger, constituency parser, dependency parser, and semantic role labeler are built from the corpus and released open source. A significant limitation uncovered by this project is the need for the NLP community to develop a widely agreed-upon schema for the annotation of clinical concepts and their relations.

CONCLUSIONS

This project takes a foundational step towards bringing the field of clinical NLP up to par with NLP in the general domain. The corpus creation and NLP components provide a resource for research and application development that would have been previously impossible.

摘要

目的

创建具有语法和语义标签层的注释临床叙述,以促进临床自然语言处理(NLP)的发展。开发 NLP 算法和开源组件。

方法

按照句法信息的 Treebank 模式、谓词-论元结构的 PropBank 模式和语义信息的统一医学语言系统(UMLS)模式,对 127606 个标记的临床叙述语料库进行手动注释。开发了 NLP 组件。

结果

最终语料库包含 13091 个句子,包含 1772 个不同的谓词词干。在新创建的 766 个 PropBank 框架中,有 74 个是动词。有 28539 个命名实体(NE)注释分布在 15 个 UMLS 语义组、一个 UMLS 语义类型和 Person 语义类别中。最常见的注释属于 UMLS 语义组:程序(15.71%)、疾病(14.74%)、概念和思想(15.10%)、解剖(12.80%)、化学物质和药物(7.49%)以及 UMLS 语义类型:症状或体征(12.46%)。注释者间一致性结果:Treebank(0.926)、PropBank(0.891-0.931)、NE(0.697-0.750)。词性标记器、短语结构解析器、依存解析器和语义角色标签器是从语料库中构建的,并作为开源发布。该项目揭示了一个重大限制,即 NLP 社区需要为临床概念及其关系的注释制定一个广泛认可的方案。

结论

该项目朝着使临床 NLP 领域与一般领域的 NLP 相媲美迈出了基础一步。语料库创建和 NLP 组件为研究和应用开发提供了资源,这在以前是不可能的。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6b8b/3756257/1b78dea0c1e9/amiajnl-2012-001317f02.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6b8b/3756257/b1dfd5c0c20a/amiajnl-2012-001317f01.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6b8b/3756257/1b78dea0c1e9/amiajnl-2012-001317f02.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6b8b/3756257/b1dfd5c0c20a/amiajnl-2012-001317f01.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6b8b/3756257/1b78dea0c1e9/amiajnl-2012-001317f02.jpg

相似文献

1
Towards comprehensive syntactic and semantic annotations of the clinical narrative.朝着临床叙述的全面句法和语义标注努力。
J Am Med Inform Assoc. 2013 Sep-Oct;20(5):922-30. doi: 10.1136/amiajnl-2012-001317. Epub 2013 Jan 25.
2
Building a comprehensive syntactic and semantic corpus of Chinese clinical texts.构建中文临床文本的综合句法和语义语料库。
J Biomed Inform. 2017 May;69:203-217. doi: 10.1016/j.jbi.2017.04.006. Epub 2017 Apr 9.
3
Towards semantic role labeling & IE in the medical literature.迈向医学文献中的语义角色标注与信息抽取
AMIA Annu Symp Proc. 2005;2005:410-4.
4
Syntactic parsing of clinical text: guideline and corpus development with handling ill-formed sentences.临床文本的句法分析:处理不规范句子的指南和语料库开发。
J Am Med Inform Assoc. 2013 Nov-Dec;20(6):1168-77. doi: 10.1136/amiajnl-2013-001810. Epub 2013 Aug 1.
5
Towards a semantic lexicon for clinical natural language processing.迈向用于临床自然语言处理的语义词典。
AMIA Annu Symp Proc. 2012;2012:568-76. Epub 2012 Nov 3.
6
SemClinBr - a multi-institutional and multi-specialty semantically annotated corpus for Portuguese clinical NLP tasks.SemClinBr - 一个用于葡萄牙语临床自然语言处理任务的多机构和多专业的语义注释语料库。
J Biomed Semantics. 2022 May 8;13(1):13. doi: 10.1186/s13326-022-00269-1.
7
Using data-driven sublanguage pattern mining to induce knowledge models: application in medical image reports knowledge representation.利用数据驱动的子语言模式挖掘来诱导知识模型:在医学图像报告知识表示中的应用。
BMC Med Inform Decis Mak. 2018 Jul 6;18(1):61. doi: 10.1186/s12911-018-0645-3.
8
CAS: corpus of clinical cases in French.法语临床病例语料库。
J Biomed Semantics. 2020 Aug 6;11(1):7. doi: 10.1186/s13326-020-00225-x.
9
Recent Advances in Clinical Natural Language Processing in Support of Semantic Analysis.支持语义分析的临床自然语言处理的最新进展。
Yearb Med Inform. 2015 Aug 13;10(1):183-93. doi: 10.15265/IY-2015-009.
10
Knowledge Author: facilitating user-driven, domain content development to support clinical information extraction.知识作者:促进用户驱动的领域内容开发,以支持临床信息提取。
J Biomed Semantics. 2016 Jun 23;7(1):42. doi: 10.1186/s13326-016-0086-9.

引用本文的文献

1
The robotic-surgery propositional bank.机器人手术命题库
Lang Resour Eval. 2024;58(3):1043-1071. doi: 10.1007/s10579-023-09668-x. Epub 2023 Jun 13.
2
Automating surgical procedure extraction for society of surgeons adult cardiac surgery registry using pretrained language models.使用预训练语言模型实现外科医生协会成人心脏手术登记处手术程序提取的自动化。
JAMIA Open. 2024 Jul 24;7(3):ooae054. doi: 10.1093/jamiaopen/ooae054. eCollection 2024 Oct.
3
Explanatory argumentation in natural language for correct and incorrect medical diagnoses.

本文引用的文献

1
Anaphoric reference in clinical reports: characteristics of an annotated corpus.临床报告中的照应关系:标注语料库的特点。
J Biomed Inform. 2012 Jun;45(3):507-21. doi: 10.1016/j.jbi.2012.01.010. Epub 2012 Feb 9.
2
A system for coreference resolution for the clinical narrative.临床叙述的共指消解系统。
J Am Med Inform Assoc. 2012 Jul-Aug;19(4):660-7. doi: 10.1136/amiajnl-2011-000599. Epub 2012 Jan 31.
3
The MiPACQ clinical question answering system.MiPACQ临床问答系统。
自然语言中的解释性论证用于正确和错误的医学诊断。
J Biomed Semantics. 2024 May 30;15(1):8. doi: 10.1186/s13326-024-00306-1.
4
Natural Language Processing for Radiation Oncology: Personalizing Treatment Pathways.放射肿瘤学中的自然语言处理:个性化治疗路径
Pharmgenomics Pers Med. 2024 Feb 13;17:65-76. doi: 10.2147/PGPM.S396971. eCollection 2024.
5
A Chinese telemedicine-dialogue dataset annotated for named entities.一个标注了命名实体的中文远程医疗对话数据集。
BMC Med Inform Decis Mak. 2023 Nov 16;23(1):264. doi: 10.1186/s12911-023-02365-3.
6
SemClinBr - a multi-institutional and multi-specialty semantically annotated corpus for Portuguese clinical NLP tasks.SemClinBr - 一个用于葡萄牙语临床自然语言处理任务的多机构和多专业的语义注释语料库。
J Biomed Semantics. 2022 May 8;13(1):13. doi: 10.1186/s13326-022-00269-1.
7
Toward Understanding Clinical Context of Medication Change Events in Clinical Narratives.理解临床叙事中药物变更事件的临床背景。
AMIA Annu Symp Proc. 2022 Feb 21;2021:833-842. eCollection 2021.
8
Annotation and initial evaluation of a large annotated German oncological corpus.一个大型带注释的德语肿瘤学语料库的注释与初步评估。
JAMIA Open. 2021 Apr 19;4(2):ooab025. doi: 10.1093/jamiaopen/ooab025. eCollection 2021 Apr.
9
A comprehensive study of mobility functioning information in clinical notes: Entity hierarchy, corpus annotation, and sequence labeling.临床笔记中移动功能信息的综合研究:实体层次结构、语料库标注和序列标记。
Int J Med Inform. 2021 Mar;147:104351. doi: 10.1016/j.ijmedinf.2020.104351. Epub 2020 Dec 24.
10
Automated Smart Home Assessment to Support Pain Management: Multiple Methods Analysis.自动化智能家居评估支持疼痛管理:多种方法分析。
J Med Internet Res. 2020 Nov 6;22(11):e23943. doi: 10.2196/23943.
AMIA Annu Symp Proc. 2011;2011:171-80. Epub 2011 Oct 22.
4
Overcoming barriers to NLP for clinical text: the role of shared tasks and the need for additional creative solutions.克服临床文本自然语言处理的障碍:共享任务的作用及对其他创造性解决方案的需求。
J Am Med Inform Assoc. 2011 Sep-Oct;18(5):540-3. doi: 10.1136/amiajnl-2011-000465.
5
2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text.2010 i2b2/VA 挑战赛:临床文本中的概念、断言和关系
J Am Med Inform Assoc. 2011 Sep-Oct;18(5):552-6. doi: 10.1136/amiajnl-2011-000203. Epub 2011 Jun 16.
6
Anaphoric relations in the clinical narrative: corpus creation.临床叙述中的回指关系:语料库创建。
J Am Med Inform Assoc. 2011 Jul-Aug;18(4):459-65. doi: 10.1136/amiajnl-2011-000108. Epub 2011 Apr 1.
7
Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications.梅奥临床文本分析和知识提取系统(cTAKES):架构、组件评估和应用。
J Am Med Inform Assoc. 2010 Sep-Oct;17(5):507-13. doi: 10.1136/jamia.2009.001560.
8
Towards temporal relation discovery from the clinical narrative.从临床叙述中发现时间关系
AMIA Annu Symp Proc. 2009 Nov 14;2009:568-72.
9
Building a semantically annotated corpus of clinical texts.构建临床文本语义标注语料库。
J Biomed Inform. 2009 Oct;42(5):950-66. doi: 10.1016/j.jbi.2008.12.013. Epub 2009 Jan 23.
10
The BioScope corpus: biomedical texts annotated for uncertainty, negation and their scopes.生物显微镜语料库:标注了不确定性、否定及其范围的生物医学文本。
BMC Bioinformatics. 2008 Nov 19;9 Suppl 11(Suppl 11):S9. doi: 10.1186/1471-2105-9-S11-S9.