Rodrigues Ramon Gouveia, das Dores Rafael Marques, Camilo-Junior Celso G, Rosa Thierson Couto
Instituto de Informática, Universidade Federal de Goiás, PO Box 131, CEP 74001-970, Brazil.
Int J Med Inform. 2016 Jan;85(1):80-95. doi: 10.1016/j.ijmedinf.2015.09.007. Epub 2015 Oct 16.
Cancer is a critical disease that affects millions of people and families around the world. In 2012 about 14.1 million new cases of cancer occurred globally. Because of many reasons like the severity of some cases, the side effects of some treatments and death of other patients, cancer patients tend to be affected by serious emotional disorders, like depression, for instance. Thus, monitoring the mood of the patients is an important part of their treatment. Many cancer patients are users of online social networks and many of them take part in cancer virtual communities where they exchange messages commenting about their treatment or giving support to other patients in the community. Most of these communities are of public access and thus are useful sources of information about the mood of patients. Based on that, Sentiment Analysis methods can be useful to automatically detect positive or negative mood of cancer patients by analyzing their messages in these online communities.
The objective of this work is to present a Sentiment Analysis tool, named SentiHealth-Cancer (SHC-pt), that improves the detection of emotional state of patients in Brazilian online cancer communities, by inspecting their posts written in Portuguese language. The SHC-pt is a sentiment analysis tool which is tailored specifically to detect positive, negative or neutral messages of patients in online communities of cancer patients. We conducted a comparative study of the proposed method with a set of general-purpose sentiment analysis tools adapted to this context.
Different collections of posts were obtained from two cancer communities in Facebook. Additionally, the posts were analyzed by sentiment analysis tools that support the Portuguese language (Semantria and SentiStrength) and by the tool SHC-pt, developed based on the method proposed in this paper called SentiHealth. Moreover, as a second alternative to analyze the texts in Portuguese, the collected texts were automatically translated into English, and submitted to sentiment analysis tools that do not support the Portuguese language (AlchemyAPI and Textalytics) and also to Semantria and SentiStrength, using the English option of these tools. Six experiments were conducted with some variations and different origins of the collected posts. The results were measured using the following metrics: precision, recall, F1-measure and accuracy
The proposed tool SHC-pt reached the best averages for accuracy and F1-measure (harmonic mean between recall and precision) in the three sentiment classes addressed (positive, negative and neutral) in all experimental settings. Moreover, the worst accuracy value (58%) achieved by SHC-pt in any experiment is 11.53% better than the greatest accuracy (52%) presented by other addressed tools. Finally, the worst average F1 (48.46%) reached by SHC-pt in any experiment is 4.14% better than the greatest average F1 (46.53%) achieved by other addressed tools. Thus, even when we compare the SHC-pt results in complex scenario versus others in easier scenario the SHC-pt is better.
This paper presents two contributions. First, it proposes the method SentiHealth to detect the mood of cancer patients that are also users of communities of patients in online social networks. Second, it presents an instantiated tool from the method, called SentiHealth-Cancer (SHC-pt), dedicated to automatically analyze posts in communities of cancer patients, based on SentiHealth. This context-tailored tool outperformed other general-purpose sentiment analysis tools at least in the cancer context. This suggests that the SentiHealth method could be instantiated as other disease-based tools during future works, for instance SentiHealth-HIV, SentiHealth-Stroke and SentiHealth-Sclerosis.
癌症是一种严重疾病,影响着全球数百万人及其家庭。2012年,全球约有1410万新发癌症病例。由于某些病例的严重性、某些治疗的副作用以及其他患者的死亡等多种原因,癌症患者往往会受到严重的情绪障碍影响,例如抑郁症。因此,监测患者的情绪是其治疗的重要组成部分。许多癌症患者是在线社交网络的用户,其中很多人参与癌症虚拟社区,在那里他们交流有关治疗的信息或为社区中的其他患者提供支持。这些社区大多是公开访问的,因此是了解患者情绪的有用信息来源。基于此,情感分析方法可通过分析这些在线社区中患者的信息,来自动检测癌症患者的积极或消极情绪。
本文的目的是提出一种名为SentiHealth-Cancer(SHC-pt)的情感分析工具,通过检查巴西在线癌症社区中患者用葡萄牙语撰写的帖子,来改进对患者情绪状态的检测。SHC-pt是一种专门定制的情感分析工具,用于检测癌症患者在线社区中患者的积极、消极或中性信息。我们将所提出的方法与一组适用于此背景的通用情感分析工具进行了比较研究。
从Facebook上的两个癌症社区获取了不同的帖子集合。此外,使用支持葡萄牙语的情感分析工具(Semantria和SentiStrength)以及基于本文提出的名为SentiHealth的方法开发的SHC-pt工具对这些帖子进行分析。此外,作为分析葡萄牙语文本的第二种选择,将收集到的文本自动翻译成英语,并提交给不支持葡萄牙语的情感分析工具(AlchemyAPI和Textalytics)以及使用这些工具英语选项的Semantria和SentiStrength。对收集到的帖子的不同来源进行了一些变化的六个实验。使用以下指标来衡量结果:精确率、召回率、F1值和准确率。
在所提出的工具SHC-pt在所有实验设置中的三个情感类别(积极、消极和中性)中,准确率和F1值(召回率和精确率之间的调和均值)达到了最佳平均值。此外,SHC-pt在任何实验中获得的最差准确率值(58%)比其他所提及工具呈现的最高准确率(52%)高11.53%。最后,SHC-pt在任何实验中达到的最差平均F1值(48.46%)比其他所提及工具实现的最高平均F1值(46.53%)高4.14%。因此,即使我们将SHC-pt在复杂场景中的结果与其他在较简单场景中的结果进行比较,SHC-pt也更好。
本文提出了两项贡献。第一,提出了SentiHealth方法来检测也是在线社交网络中患者社区用户的癌症患者的情绪。第二,提出了该方法的一个实例化工具,名为SentiHealth-Cancer(SHC-pt),专门用于基于SentiHealth自动分析癌症患者社区中的帖子。这种针对特定背景的工具至少在癌症背景下优于其他通用情感分析工具。这表明SentiHealth方法在未来的工作中可以实例化为其他基于疾病的工具,例如SentiHealth-HIV、SentiHealth-中风和SentiHealth-硬化症。