• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

临床叙述中的回指关系:语料库创建。

Anaphoric relations in the clinical narrative: corpus creation.

机构信息

Children's Hospital Boston Informatics Program and Harvard Medical School, Boston, Massachusetts 02114, USA.

出版信息

J Am Med Inform Assoc. 2011 Jul-Aug;18(4):459-65. doi: 10.1136/amiajnl-2011-000108. Epub 2011 Apr 1.

DOI:10.1136/amiajnl-2011-000108
PMID:21459927
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3128403/
Abstract

OBJECTIVE

The long-term goal of this work is the automated discovery of anaphoric relations from the clinical narrative. The creation of a gold standard set from a cross-institutional corpus of clinical notes and high-level characteristics of that gold standard are described.

METHODS

A standard methodology for annotation guideline development, gold standard annotations, and inter-annotator agreement (IAA) was used.

RESULTS

The gold standard annotations resulted in 7214 markables, 5992 pairs, and 1304 chains. Each report averaged 40 anaphoric markables, 33 pairs, and seven chains. The overall IAA is high on the Mayo dataset (0.6607), and moderate on the University of Pittsburgh Medical Center (UPMC) dataset (0.4072). The IAA between each annotator and the gold standard is high (Mayo: 0.7669, 0.7697, and 0.9021; UPMC: 0.6753 and 0.7138). These results imply a quality corpus feasible for system development. They also suggest the complementary nature of the annotations performed by the experts and the importance of an annotator team with diverse knowledge backgrounds.

LIMITATIONS

Only one of the annotators had the linguistic background necessary for annotation of the linguistic attributes. The overall generalizability of the guidelines will be further strengthened by annotations of data from additional sites. This will increase the overall corpus size and the representation of each relation type.

CONCLUSION

The first step toward the development of an anaphoric relation resolver as part of a comprehensive natural language processing system geared specifically for the clinical narrative in the electronic medical record is described. The deidentified annotated corpus will be available to researchers.

摘要

目的

这项工作的长期目标是从临床叙述中自动发现回指关系。描述了从跨机构临床笔记语料库创建黄金标准集以及该黄金标准的高级别特征。

方法

使用了一种标准的注释指南开发、黄金标准注释和注释者间一致性(IAA)方法。

结果

黄金标准注释产生了 7214 个可标记项、5992 对和 1304 个链。每份报告平均有 40 个回指可标记项、33 对和 7 个链。 Mayo 数据集的整体 IAA 较高(0.6607),而匹兹堡大学医学中心(UPMC)数据集的 IAA 适中(0.4072)。每个注释者与黄金标准之间的 IAA 较高(Mayo:0.7669、0.7697 和 0.9021;UPMC:0.6753 和 0.7138)。这些结果表明该语料库质量较高,适合系统开发。它们还表明,专家注释具有互补性,并且具有不同知识背景的注释者团队很重要。

局限性

只有一位注释者具有进行语言属性注释所需的语言学背景。通过对来自其他站点的数据进行注释,将进一步加强指南的总体概括性。这将增加整体语料库规模和每种关系类型的代表性。

结论

描述了作为专门针对电子病历中的临床叙述的全面自然语言处理系统的一部分开发回指关系解析器的第一步。将提供经过身份识别的注释语料库供研究人员使用。

相似文献

1
Anaphoric relations in the clinical narrative: corpus creation.临床叙述中的回指关系:语料库创建。
J Am Med Inform Assoc. 2011 Jul-Aug;18(4):459-65. doi: 10.1136/amiajnl-2011-000108. Epub 2011 Apr 1.
2
Syntactic parsing of clinical text: guideline and corpus development with handling ill-formed sentences.临床文本的句法分析:处理不规范句子的指南和语料库开发。
J Am Med Inform Assoc. 2013 Nov-Dec;20(6):1168-77. doi: 10.1136/amiajnl-2013-001810. Epub 2013 Aug 1.
3
On the creation of a clinical gold standard corpus in Spanish: Mining adverse drug reactions.关于创建西班牙语临床金标准语料库:挖掘药物不良反应
J Biomed Inform. 2015 Aug;56:318-32. doi: 10.1016/j.jbi.2015.06.016. Epub 2015 Jun 30.
4
Generation of silver standard concept annotations from biomedical texts with special relevance to phenotypes.从与表型特别相关的生物医学文本中生成银标准概念注释。
PLoS One. 2015 Jan 21;10(1):e0116040. doi: 10.1371/journal.pone.0116040. eCollection 2015.
5
Anaphoric reference in clinical reports: characteristics of an annotated corpus.临床报告中的照应关系:标注语料库的特点。
J Biomed Inform. 2012 Jun;45(3):507-21. doi: 10.1016/j.jbi.2012.01.010. Epub 2012 Feb 9.
6
Towards comprehensive syntactic and semantic annotations of the clinical narrative.朝着临床叙述的全面句法和语义标注努力。
J Am Med Inform Assoc. 2013 Sep-Oct;20(5):922-30. doi: 10.1136/amiajnl-2012-001317. Epub 2013 Jan 25.
7
Developing a cardiovascular disease risk factor annotated corpus of Chinese electronic medical records.开发具有心血管疾病风险因素注释的中文电子病历语料库。
BMC Med Inform Decis Mak. 2017 Aug 8;17(1):117. doi: 10.1186/s12911-017-0512-7.
8
Building a comprehensive syntactic and semantic corpus of Chinese clinical texts.构建中文临床文本的综合句法和语义语料库。
J Biomed Inform. 2017 May;69:203-217. doi: 10.1016/j.jbi.2017.04.006. Epub 2017 Apr 9.
9
Part-of-speech tagging for clinical text: wall or bridge between institutions?临床文本的词性标注:机构之间的壁垒还是桥梁?
AMIA Annu Symp Proc. 2011;2011:382-91. Epub 2011 Oct 22.
10
Annotated chemical patent corpus: a gold standard for text mining.带注释的化学专利语料库:文本挖掘的黄金标准。
PLoS One. 2014 Sep 30;9(9):e107477. doi: 10.1371/journal.pone.0107477. eCollection 2014.

引用本文的文献

1
Assessment of the impact of EHR heterogeneity for clinical research through a case study of silent brain infarction.通过对沉默性脑梗死的案例研究评估电子健康记录异质性对临床研究的影响。
BMC Med Inform Decis Mak. 2020 Mar 30;20(1):60. doi: 10.1186/s12911-020-1072-9.
2
Semantic annotation of consumer health questions.消费者健康问题的语义标注。
BMC Bioinformatics. 2018 Feb 6;19(1):34. doi: 10.1186/s12859-018-2045-1.
3
Design of an extensive information representation scheme for clinical narratives.临床叙述的广泛信息表示方案设计
J Biomed Semantics. 2017 Sep 11;8(1):37. doi: 10.1186/s13326-017-0135-z.
4
Coreference annotation and resolution in the Colorado Richly Annotated Full Text (CRAFT) corpus of biomedical journal articles.科罗拉多生物医学期刊文章丰富注释全文(CRAFT)语料库中的共指标注与消解
BMC Bioinformatics. 2017 Aug 17;18(1):372. doi: 10.1186/s12859-017-1775-9.
5
Towards generalizable entity-centric clinical coreference resolution.迈向可泛化的以实体为中心的临床共指消解
J Biomed Inform. 2017 May;69:251-258. doi: 10.1016/j.jbi.2017.04.015. Epub 2017 Apr 21.
6
Bio-SCoRes: A Smorgasbord Architecture for Coreference Resolution in Biomedical Text.生物共指消解评分系统(Bio-SCoRes):一种用于生物医学文本共指消解的混合架构
PLoS One. 2016 Mar 2;11(3):e0148538. doi: 10.1371/journal.pone.0148538. eCollection 2016.
7
"Big data" and the electronic health record.“大数据”与电子健康记录
Yearb Med Inform. 2014 Aug 15;9(1):97-104. doi: 10.15265/IY-2014-0003.
8
Electronic health records-driven phenotyping: challenges, recent advances, and perspectives.电子健康记录驱动的表型分析:挑战、最新进展与展望
J Am Med Inform Assoc. 2013 Dec;20(e2):e206-11. doi: 10.1136/amiajnl-2013-002428.
9
Recent trends in biomedical informatics: a study based on JAMIA articles.生物医学信息学的最新趋势:基于 JAMIA 文章的研究。
J Am Med Inform Assoc. 2013 Dec;20(e2):e198-205. doi: 10.1136/amiajnl-2013-002429. Epub 2013 Nov 8.
10
Using nanoinformatics methods for automatically identifying relevant nanotoxicology entities from the literature.利用纳米信息学方法从文献中自动识别相关的纳米毒理学实体。
Biomed Res Int. 2013;2013:410294. doi: 10.1155/2013/410294. Epub 2012 Dec 27.

本文引用的文献

1
Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications.梅奥临床文本分析和知识提取系统(cTAKES):架构、组件评估和应用。
J Am Med Inform Assoc. 2010 Sep-Oct;17(5):507-13. doi: 10.1136/jamia.2009.001560.
2
caTIES: a grid based system for coding and retrieval of surgical pathology reports and tissue specimens in support of translational research.caTIES:一个基于网格的系统,用于编码和检索外科病理学报告和组织标本,以支持转化研究。
J Am Med Inform Assoc. 2010 May-Jun;17(3):253-64. doi: 10.1136/jamia.2009.002295.
3
Building a semantically annotated corpus of clinical texts.构建临床文本语义标注语料库。
J Biomed Inform. 2009 Oct;42(5):950-66. doi: 10.1016/j.jbi.2008.12.013. Epub 2009 Jan 23.
4
ConText: an algorithm for determining negation, experiencer, and temporal status from clinical reports.语境:一种从临床报告中确定否定、体验者和时间状态的算法。
J Biomed Inform. 2009 Oct;42(5):839-51. doi: 10.1016/j.jbi.2009.05.002. Epub 2009 May 10.
5
Automatically extracting cancer disease characteristics from pathology reports into a Disease Knowledge Representation Model.从病理报告中自动提取癌症疾病特征到疾病知识表示模型中。
J Biomed Inform. 2009 Oct;42(5):937-49. doi: 10.1016/j.jbi.2008.12.005. Epub 2008 Dec 27.
6
Agreement, the f-measure, and reliability in information retrieval.信息检索中的一致性、F值与可靠性。
J Am Med Inform Assoc. 2005 May-Jun;12(3):296-8. doi: 10.1197/jamia.M1733. Epub 2005 Jan 31.
7
Exploring semantic groups through visual approaches.通过视觉方法探索语义群组。
J Biomed Inform. 2003 Dec;36(6):414-32. doi: 10.1016/j.jbi.2003.11.002.
8
Electronic interpretation of chest radiograph reports to detect central venous catheters.胸部X光片报告的电子解读以检测中心静脉导管。
Infect Control Hosp Epidemiol. 2003 Dec;24(12):950-4. doi: 10.1086/502165.
9
A broad-coverage natural language processing system.一个具有广泛覆盖范围的自然语言处理系统。
Proc AMIA Symp. 2000:270-4.
10
Automatic detection of acute bacterial pneumonia from chest X-ray reports.从胸部X光报告中自动检测急性细菌性肺炎。
J Am Med Inform Assoc. 2000 Nov-Dec;7(6):593-604. doi: 10.1136/jamia.2000.0070593.