Suppr超能文献

NegAIT:一种使用形态学、句子结构和双重否定进行医学文本简化的新型解析器。

NegAIT: A new parser for medical text simplification using morphological, sentential and double negation.

作者信息

Mukherjee Partha, Leroy Gondy, Kauchak David, Rajanarayanan Srinidhi, Romero Diaz Damian Y, Yuan Nicole P, Pritchard T Gail, Colina Sonia

机构信息

University of Arizona, Tucson, AZ, United States.

University of Arizona, Tucson, AZ, United States.

出版信息

J Biomed Inform. 2017 May;69:55-62. doi: 10.1016/j.jbi.2017.03.014. Epub 2017 Mar 22.

Abstract

Many different text features influence text readability and content comprehension. Negation is commonly suggested as one such feature, but few general-purpose tools exist to discover negation and studies of the impact of negation on text readability are rare. In this paper, we introduce a new negation parser (NegAIT) for detecting morphological, sentential, and double negation. We evaluated the parser using a human annotated gold standard containing 500 Wikipedia sentences and achieved 95%, 89% and 67% precision with 100%, 80%, and 67% recall, respectively. We also investigate two applications of this new negation parser. First, we performed a corpus statistics study to demonstrate different negation usage in easy and difficult text. Negation usage was compared in six corpora: patient blogs (4K sentences), Cochrane reviews (91K sentences), PubMed abstracts (20K sentences), clinical trial texts (48K sentences), and English and Simple English Wikipedia articles for different medical topics (60K and 6K sentences). The most difficult text contained the least negation. However, when comparing negation types, difficult texts (i.e., Cochrane, PubMed, English Wikipedia and clinical trials) contained significantly (p<0.01) more morphological negations. Second, we conducted a predictive analytics study to show the importance of negation in distinguishing between easy and difficulty text. Five binary classifiers (Naïve Bayes, SVM, decision tree, logistic regression and linear regression) were trained using only negation information. All classifiers achieved better performance than the majority baseline. The Naïve Bayes' classifier achieved the highest accuracy at 77% (9% higher than the majority baseline).

摘要

许多不同的文本特征会影响文本的可读性和内容理解。否定通常被认为是这样一种特征,但用于发现否定的通用工具很少,而且关于否定对文本可读性影响的研究也很少见。在本文中,我们介绍了一种新的否定解析器(NegAIT),用于检测形态否定、句子否定和双重否定。我们使用一个包含500个维基百科句子的人工标注黄金标准对该解析器进行了评估,精确率分别达到了95%、89%和67%,召回率分别为100%、80%和67%。我们还研究了这种新的否定解析器的两个应用。首先,我们进行了一项语料库统计研究,以展示简单文本和难文本中不同的否定用法。在六个语料库中比较了否定用法:患者博客(4000个句子)、考科蓝综述(91000个句子)、医学期刊数据库摘要(20000个句子)、临床试验文本(48000个句子)以及针对不同医学主题的英语和简单英语维基百科文章(60000个和6000个句子)。最难的文本中否定最少。然而,在比较否定类型时,难文本(即考科蓝综述、医学期刊数据库、英语维基百科和临床试验)中形态否定显著更多(p<0.01)。其次,我们进行了一项预测分析研究,以表明否定在区分简单文本和难文本方面的重要性。仅使用否定信息训练了五个二元分类器(朴素贝叶斯、支持向量机、决策树、逻辑回归和线性回归)。所有分类器的性能都优于多数基线。朴素贝叶斯分类器的准确率最高,为77%(比多数基线高9%)。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a5b6/5933936/df7e9e0b02ea/nihms864544f1.jpg

相似文献

2
Domain adaption of parsing for operative notes.手术记录解析的领域适应
J Biomed Inform. 2015 Apr;54:1-9. doi: 10.1016/j.jbi.2015.01.016. Epub 2015 Feb 7.
4
Parsing clinical text: how good are the state-of-the-art parsers?解析临床文本:最先进的解析器有多出色?
BMC Med Inform Decis Mak. 2015;15 Suppl 1(Suppl 1):S2. doi: 10.1186/1472-6947-15-S1-S2. Epub 2015 May 20.
10
Automatic negation detection in narrative pathology reports.自动否定词检测在叙事病理学报告中的应用。
Artif Intell Med. 2015 May;64(1):41-50. doi: 10.1016/j.artmed.2015.03.001. Epub 2015 Mar 24.

引用本文的文献

3
A survey of automated methods for biomedical text simplification.生物医学文本简化的自动化方法调查。
J Am Med Inform Assoc. 2022 Oct 7;29(11):1976-1988. doi: 10.1093/jamia/ocac149.
4
Paragraph-level Simplification of Medical Texts.医学文本的段落级简化
Proc Conf. 2021 Jun;2021:4972-4984. doi: 10.18653/v1/2021.naacl-main.395.
7
Natural Language Processing for EHR-Based Computational Phenotyping.基于电子健康记录的自然语言处理计算表型。
IEEE/ACM Trans Comput Biol Bioinform. 2019 Jan-Feb;16(1):139-153. doi: 10.1109/TCBB.2018.2849968. Epub 2018 Jun 25.

本文引用的文献

1
Measuring Text Difficulty Using Parse-Tree Frequency.利用句法树频率测量文本难度
J Assoc Inf Sci Technol. 2017 Sep;68(9):2088-2100. doi: 10.1002/asi.23855. Epub 2017 Jun 20.
3
7
The effect of word familiarity on actual and perceived text difficulty.词汇熟悉度对实际文本难度和感知文本难度的影响。
J Am Med Inform Assoc. 2014 Feb;21(e1):e169-72. doi: 10.1136/amiajnl-2013-002172. Epub 2013 Oct 7.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验