• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于在跨机构临床文本中检测医学专业的文档子语言聚类

Document Sublanguage Clustering to Detect Medical Specialty in Cross-institutional Clinical Texts.

作者信息

Doing-Harris Kristina, Patterson Olga, Igo Sean, Hurdle John

机构信息

Department of Biomedical Informatics, University of Utah, Health Sciences Center, Salt Lake City, UT.

VA SLC Health Care, Salt Lake City, UT.

出版信息

Proc ACM Int Workshop Data Text Min Biomed Inform. 2013 Oct-Nov;2013:9-12. doi: 10.1145/2512089.2512101.

DOI:10.1145/2512089.2512101
PMID:27077137
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4827341/
Abstract

This paper reports on a set of studies designed to identify sublanguages in documents for domain-specific processing across institutions. Psychological evidence indicates that humans use context-specific linguistic information when they read. Natural Language Processing (NLP) pipelines are successful within specific domains (i.e., contexts). To limit the number of domain-specific NLP systems, a natural focus would be on sublanguages. Sublanguages are identified by shared lexical and semantic features.[1] Patterson and Hurdle[2] developed a sublanguage identification system that functioned well for 12 clinical specialties at the University of Utah. The current work compares sublanguages across institutions. Using a clinical NLP pipeline augmented by a new document corpus from the University of Pittsburg (UPitt), new documents were assigned to clusters based on the minimum cosine-distance to a Utah cluster centroid. The UPitt documents were divided into a nine-group specialty corpus. Across institutions, five of the specialty groups fell within the expected clusters. We find that clustering encounters difficulty due to documents with mixed sublanguages; naming convention differences across institutions; and document types used across specialties. The findings indicate that clinical specialty sublanguages can be identified across institutions.

摘要

本文报告了一系列旨在识别文档中的子语言以实现跨机构特定领域处理的研究。心理学证据表明,人类阅读时会使用特定上下文的语言信息。自然语言处理(NLP)管道在特定领域(即上下文)内是成功的。为了限制特定领域NLP系统的数量,自然的关注点将是子语言。子语言是通过共享的词汇和语义特征来识别的。[1]帕特森和赫德[2]开发了一种子语言识别系统,该系统在犹他大学的12个临床专业中运行良好。当前的工作比较了不同机构的子语言。使用由匹兹堡大学(UPitt)的新文档语料库增强的临床NLP管道,根据与犹他聚类中心的最小余弦距离将新文档分配到聚类中。UPitt的文档被分为一个九组专业语料库。在不同机构中,五个专业组落在预期的聚类中。我们发现,由于具有混合子语言的文档、不同机构之间的命名约定差异以及各专业使用的文档类型,聚类遇到了困难。研究结果表明,可以跨机构识别临床专业子语言。

相似文献

1
Document Sublanguage Clustering to Detect Medical Specialty in Cross-institutional Clinical Texts.用于在跨机构临床文本中检测医学专业的文档子语言聚类
Proc ACM Int Workshop Data Text Min Biomed Inform. 2013 Oct-Nov;2013:9-12. doi: 10.1145/2512089.2512101.
2
Sublanguage Corpus Analysis Toolkit: A tool for assessing the representativeness and sublanguage characteristics of corpora.子语言语料库分析工具包:一种用于评估语料库代表性和子语言特征的工具。
LREC Int Conf Lang Resour Eval. 2014 May;2014:1714-1718.
3
Two biomedical sublanguages: a description based on the theories of Zellig Harris.两种生物医学子语言:基于泽利格·哈里斯理论的一种描述
J Biomed Inform. 2002 Aug;35(4):222-35. doi: 10.1016/s1532-0464(03)00012-1.
4
Document clustering of clinical narratives: a systematic study of clinical sublanguages.临床叙述的文档聚类:临床子语言的系统研究
AMIA Annu Symp Proc. 2011;2011:1099-107. Epub 2011 Oct 22.
5
Automated concept and relationship extraction for the semi-automated ontology management (SEAM) system.用于半自动本体管理(SEAM)系统的自动化概念与关系提取
J Biomed Semantics. 2015 Apr 2;6:15. doi: 10.1186/s13326-015-0011-7. eCollection 2015.
6
Using data-driven sublanguage pattern mining to induce knowledge models: application in medical image reports knowledge representation.利用数据驱动的子语言模式挖掘来诱导知识模型:在医学图像报告知识表示中的应用。
BMC Med Inform Decis Mak. 2018 Jul 6;18(1):61. doi: 10.1186/s12911-018-0645-3.
7
Knowledge Author: facilitating user-driven, domain content development to support clinical information extraction.知识作者:促进用户驱动的领域内容开发,以支持临床信息提取。
J Biomed Semantics. 2016 Jun 23;7(1):42. doi: 10.1186/s13326-016-0086-9.
8
Building a comprehensive syntactic and semantic corpus of Chinese clinical texts.构建中文临床文本的综合句法和语义语料库。
J Biomed Inform. 2017 May;69:203-217. doi: 10.1016/j.jbi.2017.04.006. Epub 2017 Apr 9.
9
OWL 2 learn profile: an ontology sublanguage for the learning domain.OWL 2学习概要:一种用于学习领域的本体子语言。
Springerplus. 2016 Mar 8;5:291. doi: 10.1186/s40064-016-1826-0. eCollection 2016.
10
Notations for high efficiency data presentation in mammography.乳腺钼靶摄影中高效数据呈现的符号表示。
Proc AMIA Annu Fall Symp. 1996:557-61.

引用本文的文献

1
Dynamic few-shot prompting for clinical note section classification using lightweight, open-source large language models.使用轻量级开源大语言模型进行临床笔记章节分类的动态少样本提示
J Am Med Inform Assoc. 2025 Jul 1;32(7):1164-1173. doi: 10.1093/jamia/ocaf084.
2
Contextual Variation of Clinical Notes induced by EHR Migration.临床记录因电子病历迁移而产生的语境变化。
AMIA Annu Symp Proc. 2024 Jan 11;2023:1155-1164. eCollection 2023.
3
Collecting specialty-related medical terms: Development and evaluation of a resource for Spanish.

本文引用的文献

1
Using domain knowledge and domain-inspired discourse model for coreference resolution for clinical narratives.利用领域知识和领域启发的语篇模型解决临床叙述中的共指消解问题。
J Am Med Inform Assoc. 2013 Mar-Apr;20(2):356-62. doi: 10.1136/amiajnl-2011-000767. Epub 2012 Jul 10.
2
Document clustering of clinical narratives: a systematic study of clinical sublanguages.临床叙述的文档聚类:临床子语言的系统研究
AMIA Annu Symp Proc. 2011;2011:1099-107. Epub 2011 Oct 22.
3
Automatic acquisition of sublanguage semantic schema: towards the word sense disambiguation of clinical narratives.
收集专业相关医学术语:西班牙语资源的开发与评估。
BMC Med Inform Decis Mak. 2021 May 4;21(1):145. doi: 10.1186/s12911-021-01495-w.
4
Ensembles of natural language processing systems for portable phenotyping solutions.用于便携表型解决方案的自然语言处理系统集合。
J Biomed Inform. 2019 Dec;100:103318. doi: 10.1016/j.jbi.2019.103318. Epub 2019 Oct 23.
5
Detecting Secular Trends in Clinical Treatment through Temporal Analysis.通过时间分析检测临床治疗中的长期趋势。
J Med Syst. 2019 Feb 12;43(3):74. doi: 10.1007/s10916-019-1173-0.
6
Trie-based rule processing for clinical NLP: A use-case study of n-trie, making the ConText algorithm more efficient and scalable.基于 Trie 的规则处理在临床自然语言处理中的应用:n-trie 的使用案例研究,使 ConText 算法更高效、更具可扩展性。
J Biomed Inform. 2018 Sep;85:106-113. doi: 10.1016/j.jbi.2018.08.002. Epub 2018 Aug 6.
7
Medical subdomain classification of clinical notes using a machine learning-based natural language processing approach.基于机器学习的自然语言处理方法对临床笔记进行医学子域分类。
BMC Med Inform Decis Mak. 2017 Dec 1;17(1):155. doi: 10.1186/s12911-017-0556-8.
子语言语义模式的自动获取:迈向临床叙述的词义消歧
AMIA Annu Symp Proc. 2010 Nov 13;2010:612-6.
4
An overview of MetaMap: historical perspective and recent advances.MetaMap 概述:历史视角与最新进展。
J Am Med Inform Assoc. 2010 May-Jun;17(3):229-36. doi: 10.1136/jamia.2009.002733.
5
Linking genes to literature: text mining, information extraction, and retrieval applications for biology.将基因与文献相联系:生物学的文本挖掘、信息提取及检索应用
Genome Biol. 2008;9 Suppl 2(Suppl 2):S8. doi: 10.1186/gb-2008-9-s2-s8. Epub 2008 Sep 1.
6
An electronic health record based on structured narrative.基于结构化叙述的电子健康记录。
J Am Med Inform Assoc. 2008 Jan-Feb;15(1):54-64. doi: 10.1197/jamia.M2131. Epub 2007 Oct 18.
7
Data clustering in life sciences.生命科学中的数据聚类
Mol Biotechnol. 2005 Sep;31(1):55-80. doi: 10.1385/MB:31:1:055.
8
Two biomedical sublanguages: a description based on the theories of Zellig Harris.两种生物医学子语言:基于泽利格·哈里斯理论的一种描述
J Biomed Inform. 2002 Aug;35(4):222-35. doi: 10.1016/s1532-0464(03)00012-1.
9
The structure of science information.科学信息的结构
J Biomed Inform. 2002 Aug;35(4):215-21. doi: 10.1016/s1532-0464(03)00011-x.
10
Comparing syntactic complexity in medical and non-medical corpora.比较医学语料库和非医学语料库中的句法复杂性。
Proc AMIA Symp. 2001:90-4.