Suppr超能文献

自动检测在线社区文本自然语言处理工具中的故障。

Automatically Detecting Failures in Natural Language Processing Tools for Online Community Text.

作者信息

Park Albert, Hartzler Andrea L, Huh Jina, McDonald David W, Pratt Wanda

机构信息

Department of Biomedical Informatics and Medical Education, School of Medicine, University of Washington, Seattle, WA, United States.

出版信息

J Med Internet Res. 2015 Aug 31;17(8):e212. doi: 10.2196/jmir.4612.

Abstract

BACKGROUND

The prevalence and value of patient-generated health text are increasing, but processing such text remains problematic. Although existing biomedical natural language processing (NLP) tools are appealing, most were developed to process clinician- or researcher-generated text, such as clinical notes or journal articles. In addition to being constructed for different types of text, other challenges of using existing NLP include constantly changing technologies, source vocabularies, and characteristics of text. These continuously evolving challenges warrant the need for applying low-cost systematic assessment. However, the primarily accepted evaluation method in NLP, manual annotation, requires tremendous effort and time.

OBJECTIVE

The primary objective of this study is to explore an alternative approach-using low-cost, automated methods to detect failures (eg, incorrect boundaries, missed terms, mismapped concepts) when processing patient-generated text with existing biomedical NLP tools. We first characterize common failures that NLP tools can make in processing online community text. We then demonstrate the feasibility of our automated approach in detecting these common failures using one of the most popular biomedical NLP tools, MetaMap.

METHODS

Using 9657 posts from an online cancer community, we explored our automated failure detection approach in two steps: (1) to characterize the failure types, we first manually reviewed MetaMap's commonly occurring failures, grouped the inaccurate mappings into failure types, and then identified causes of the failures through iterative rounds of manual review using open coding, and (2) to automatically detect these failure types, we then explored combinations of existing NLP techniques and dictionary-based matching for each failure cause. Finally, we manually evaluated the automatically detected failures.

RESULTS

From our manual review, we characterized three types of failure: (1) boundary failures, (2) missed term failures, and (3) word ambiguity failures. Within these three failure types, we discovered 12 causes of inaccurate mappings of concepts. We used automated methods to detect almost half of 383,572 MetaMap's mappings as problematic. Word sense ambiguity failure was the most widely occurring, comprising 82.22% of failures. Boundary failure was the second most frequent, amounting to 15.90% of failures, while missed term failures were the least common, making up 1.88% of failures. The automated failure detection achieved precision, recall, accuracy, and F1 score of 83.00%, 92.57%, 88.17%, and 87.52%, respectively.

CONCLUSIONS

We illustrate the challenges of processing patient-generated online health community text and characterize failures of NLP tools on this patient-generated health text, demonstrating the feasibility of our low-cost approach to automatically detect those failures. Our approach shows the potential for scalable and effective solutions to automatically assess the constantly evolving NLP tools and source vocabularies to process patient-generated text.

摘要

背景

患者生成的健康文本的普及率和价值在不断提高,但处理此类文本仍然存在问题。尽管现有的生物医学自然语言处理(NLP)工具很有吸引力,但大多数是为处理临床医生或研究人员生成的文本而开发的,如临床笔记或期刊文章。除了针对不同类型的文本构建之外,使用现有NLP的其他挑战还包括不断变化的技术、源词汇表和文本特征。这些不断演变的挑战使得有必要应用低成本的系统评估。然而,NLP中主要被接受的评估方法——人工标注,需要耗费大量精力和时间。

目的

本研究的主要目的是探索一种替代方法,即使用低成本的自动化方法来检测在用现有的生物医学NLP工具处理患者生成的文本时出现的故障(例如,边界不正确、术语遗漏、概念映射错误)。我们首先描述NLP工具在处理在线社区文本时可能出现的常见故障。然后,我们使用最流行的生物医学NLP工具之一MetaMap来证明我们的自动化方法在检测这些常见故障方面具有可行性。

方法

我们使用来自一个在线癌症社区的9657篇帖子,分两步探索我们的自动化故障检测方法:(1)为了描述故障类型,我们首先人工检查MetaMap常见的故障,将不准确的映射分组为故障类型,然后通过使用开放编码的迭代人工检查轮次来确定故障原因;(2)为了自动检测这些故障类型,我们针对每个故障原因探索了现有NLP技术和基于字典匹配的组合。最后,我们人工评估自动检测到的故障。

结果

通过人工检查,我们描述了三种故障类型:(1)边界故障,(2)术语遗漏故障,(3)词义模糊故障。在这三种故障类型中,我们发现了概念映射不准确的12个原因。我们使用自动化方法检测到MetaMap的383572个映射中近一半存在问题。词义模糊故障出现得最为广泛,占故障的82.22%。边界故障是第二常见的,占故障的15.90%,而术语遗漏故障最不常见,占故障的1.88%。自动故障检测的精确率、召回率、准确率和F1分数分别为83.00%、92.57%、88.17%和87.52%。

结论

我们阐述了处理患者生成的在线健康社区文本的挑战,并描述了NLP工具在这种患者生成的健康文本上的故障,证明了我们低成本方法自动检测这些故障的可行性。我们的方法显示了可扩展且有效的解决方案的潜力,可用于自动评估不断发展的NLP工具和源词汇表,以处理患者生成的文本。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1662/4642409/9773c6201a4e/jmir_v17i8e212_fig1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验