Suppr超能文献

电子健康论坛中的情感分析的特征工程。

Feature engineering for sentiment analysis in e-health forums.

机构信息

UNED IR & NLP Group, Madrid, Spain.

出版信息

PLoS One. 2018 Nov 29;13(11):e0207996. doi: 10.1371/journal.pone.0207996. eCollection 2018.

Abstract

INTRODUCTION

Exploiting information in health-related social media services is of great interest for patients, researchers and medical companies. The challenge is, however, to provide easy, quick and relevant access to the vast amount of information that is available. One step towards facilitating information access to online health data is opinion mining. Even though the classification of patient opinions into positive and negative has been previously tackled, most works make use of machine learning methods and bags of words. Our first contribution is an extensive evaluation of different features, including lexical, syntactic, semantic, network-based, sentiment-based and word embeddings features to represent patient-authored texts for polarity classification. The second contribution of this work is the study of polar facts (i.e. objective information with polar connotations). Traditionally, the presence of polar facts has been neglected and research in polarity classification has been bounded to opinionated texts. We demonstrate the existence and importance of polar facts for the polarity classification of health information.

MATERIAL AND METHODS

We annotate a set of more than 3500 posts to online health forums of breast cancer, crohn and different allergies, respectively. Each sentence in a post is manually labeled as "experience", "fact" or "opinion", and as "positive", "negative" and "neutral". Using this data, we train different machine learning algorithms and compare traditional bags of words representations with word embeddings in combination with lexical, syntactic, semantic, network-based and emotional properties of texts to automatically classify patient-authored contents into positive, negative and neutral. Beside, we experiment with a combination of textual and semantic representations by generating concept embeddings using the UMLS Metathesaurus.

RESULTS

We reach two main results: first, we find that it is possible to predict polarity of patient-authored contents with a very high accuracy (≈ 70 percent) using word embeddings, and that this considerably outperforms more traditional representations like bags of words; and second, when dealing with medical information, negative and positive facts (i.e. objective information) are nearly as frequent as negative and positive opinions and experiences (i.e. subjective information), and their importance for polarity classification is crucial.

摘要

简介

利用与健康相关的社交媒体服务中的信息对患者、研究人员和医疗公司来说非常有意义。然而,挑战在于为可用的大量信息提供简便、快速和相关的访问途径。促进在线健康数据信息访问的一个步骤是意见挖掘。尽管已经对将患者意见分类为积极和消极进行了研究,但大多数工作都利用机器学习方法和词袋。我们的第一个贡献是对不同特征(包括词汇、句法、语义、基于网络、基于情感和单词嵌入特征)进行广泛评估,以表示患者撰写的文本进行极性分类。这项工作的第二个贡献是研究极性事实(即具有极性内涵的客观信息)。传统上,忽略了极性事实的存在,并且极性分类研究仅限于有意见的文本。我们证明了极性事实对于健康信息极性分类的存在和重要性。

材料与方法

我们分别对来自在线乳腺癌、克罗恩病和不同过敏症的健康论坛的 3500 多个帖子进行了标注。帖子中的每个句子都被手动标记为“经验”、“事实”或“意见”,以及“积极”、“消极”和“中立”。使用此数据,我们训练了不同的机器学习算法,并将传统的词袋表示与单词嵌入相结合,结合文本的词汇、句法、语义、基于网络和情感属性,以自动将患者撰写的内容分类为积极、消极和中立。此外,我们通过使用 UMLS Metathesaurus 生成概念嵌入来尝试文本和语义表示的组合。

结果

我们得出了两个主要结果:首先,我们发现使用单词嵌入可以非常准确地预测(≈70%)患者撰写的内容的极性,并且这明显优于更传统的表示形式,如词袋;其次,在处理医疗信息时,负面和正面事实(即客观信息)几乎与负面和正面意见和经验(即主观信息)一样频繁,并且它们对极性分类至关重要。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/47b3/6264154/cd4094836f16/pone.0207996.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验