Wu Stephen, Liu Hongfang
Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA.
AMIA Annu Symp Proc. 2011;2011:1550-8. Epub 2011 Oct 22.
Natural language processing (NLP) has become crucial in unlocking information stored in free text, from both clinical notes and biomedical literature. Clinical notes convey clinical information related to individual patient health care, while biomedical literature communicates scientific findings. This work focuses on semantic characterization of texts at an enterprise scale, comparing and contrasting the two domains and their NLP approaches. We analyzed the empirical distributional characteristics of NLP-discovered named entities in Mayo Clinic clinical notes from 2001-2010, and in the 2011 MetaMapped Medline Baseline. We give qualitative and quantitative measures of domain similarity and point to the feasibility of transferring resources and techniques. An important by-product for this study is the development of a weighted ontology for each domain, which gives distributional semantic information that may be used to improve NLP applications.
自然语言处理(NLP)在从临床记录和生物医学文献的自由文本中解锁存储的信息方面已变得至关重要。临床记录传达与个体患者医疗保健相关的临床信息,而生物医学文献则传达科学发现。这项工作专注于企业规模文本的语义特征,比较和对比这两个领域及其NLP方法。我们分析了2001年至2010年梅奥诊所临床记录以及2011年MetaMapped Medline基线中NLP发现的命名实体的经验分布特征。我们给出了领域相似性的定性和定量度量,并指出了转移资源和技术的可行性。这项研究的一个重要副产品是为每个领域开发了一个加权本体,它提供了可用于改进NLP应用的分布语义信息。