Zunic Anastazia, Corcoran Padraig, Spasic Irena
School of Computer Science & Informatics, Cardiff University, Cardiff, United Kingdom.
JMIR Med Inform. 2020 Jan 28;8(1):e16023. doi: 10.2196/16023.
Sentiment analysis (SA) is a subfield of natural language processing whose aim is to automatically classify the sentiment expressed in a free text. It has found practical applications across a wide range of societal contexts including marketing, economy, and politics. This review focuses specifically on applications related to health, which is defined as "a state of complete physical, mental, and social well-being and not merely the absence of disease or infirmity."
This study aimed to establish the state of the art in SA related to health and well-being by conducting a systematic review of the recent literature. To capture the perspective of those individuals whose health and well-being are affected, we focused specifically on spontaneously generated content and not necessarily that of health care professionals.
Our methodology is based on the guidelines for performing systematic reviews. In January 2019, we used PubMed, a multifaceted interface, to perform a literature search against MEDLINE. We identified a total of 86 relevant studies and extracted data about the datasets analyzed, discourse topics, data creators, downstream applications, algorithms used, and their evaluation.
The majority of data were collected from social networking and Web-based retailing platforms. The primary purpose of online conversations is to exchange information and provide social support online. These communities tend to form around health conditions with high severity and chronicity rates. Different treatments and services discussed include medications, vaccination, surgery, orthodontic services, individual physicians, and health care services in general. We identified 5 roles with respect to health and well-being among the authors of the types of spontaneously generated narratives considered in this review: a sufferer, an addict, a patient, a carer, and a suicide victim. Out of 86 studies considered, only 4 reported the demographic characteristics. A wide range of methods were used to perform SA. Most common choices included support vector machines, naïve Bayesian learning, decision trees, logistic regression, and adaptive boosting. In contrast with general trends in SA research, only 1 study used deep learning. The performance lags behind the state of the art achieved in other domains when measured by F-score, which was found to be below 60% on average. In the context of SA, the domain of health and well-being was found to be resource poor: few domain-specific corpora and lexica are shared publicly for research purposes.
SA results in the area of health and well-being lag behind those in other domains. It is yet unclear if this is because of the intrinsic differences between the domains and their respective sublanguages, the size of training datasets, the lack of domain-specific sentiment lexica, or the choice of algorithms.
情感分析(SA)是自然语言处理的一个子领域,其目的是自动对自由文本中表达的情感进行分类。它已在包括营销、经济和政治在内的广泛社会背景中得到实际应用。本综述特别关注与健康相关的应用,健康被定义为“一种身体、心理和社会的完全良好状态,而不仅仅是没有疾病或虚弱”。
本研究旨在通过对近期文献进行系统综述,确定与健康和幸福相关的情感分析的现状。为了捕捉那些健康和幸福受到影响的个人的观点,我们特别关注自发产生的内容,而不一定是医疗保健专业人员的内容。
我们的方法基于进行系统综述的指南。2019年1月,我们使用多面接口的PubMed对MEDLINE进行文献检索。我们共识别出86项相关研究,并提取了有关分析的数据集、话语主题、数据创建者、下游应用、使用的算法及其评估的数据。
大多数数据是从社交网络和基于网络的零售平台收集的。在线对话的主要目的是在线交换信息和提供社会支持。这些社区往往围绕严重程度和慢性病发生率高的健康状况形成。讨论的不同治疗方法和服务包括药物治疗、疫苗接种、手术、正畸服务、个体医生以及一般的医疗保健服务。在本综述中考虑的自发产生的叙述类型的作者中,我们确定了与健康和幸福相关的5个角色:患者、成瘾者、病人、护理人员和自杀受害者。在86项研究中,只有4项报告了人口统计学特征。广泛使用了各种方法来进行情感分析。最常见的选择包括支持向量机、朴素贝叶斯学习、决策树、逻辑回归和自适应增强。与情感分析研究的一般趋势相反,只有1项研究使用了深度学习。以F分数衡量,其性能落后于其他领域取得的现有水平,平均低于60%。在情感分析的背景下,发现健康和幸福领域资源匮乏:很少有专门针对该领域的语料库和词典公开共享用于研究目的。
健康和幸福领域的情感分析结果落后于其他领域。目前尚不清楚这是因为领域及其各自子语言之间的内在差异、训练数据集的大小、缺乏特定领域的情感词典,还是算法的选择。