Gohil Sunir, Vuik Sabine, Darzi Ara
Imperial College London, Department of Surgery and Cancer, London, United Kingdom.
JMIR Public Health Surveill. 2018 Apr 23;4(2):e43. doi: 10.2196/publichealth.5789.
Twitter is a microblogging service where users can send and read short 140-character messages called "tweets." There are several unstructured, free-text tweets relating to health care being shared on Twitter, which is becoming a popular area for health care research. Sentiment is a metric commonly used to investigate the positive or negative opinion within these messages. Exploring the methods used for sentiment analysis in Twitter health care research may allow us to better understand the options available for future research in this growing field.
The first objective of this study was to understand which tools would be available for sentiment analysis of Twitter health care research, by reviewing existing studies in this area and the methods they used. The second objective was to determine which method would work best in the health care settings, by analyzing how the methods were used to answer specific health care questions, their production, and how their accuracy was analyzed.
A review of the literature was conducted pertaining to Twitter and health care research, which used a quantitative method of sentiment analysis for the free-text messages (tweets). The study compared the types of tools used in each case and examined methods for tool production, tool training, and analysis of accuracy.
A total of 12 papers studying the quantitative measurement of sentiment in the health care setting were found. More than half of these studies produced tools specifically for their research, 4 used open source tools available freely, and 2 used commercially available software. Moreover, 4 out of the 12 tools were trained using a smaller sample of the study's final data. The sentiment method was trained against, on an average, 0.45% (2816/627,024) of the total sample data. One of the 12 papers commented on the analysis of accuracy of the tool used.
Multiple methods are used for sentiment analysis of tweets in the health care setting. These range from self-produced basic categorizations to more complex and expensive commercial software. The open source and commercial methods are developed on product reviews and generic social media messages. None of these methods have been extensively tested against a corpus of health care messages to check their accuracy. This study suggests that there is a need for an accurate and tested tool for sentiment analysis of tweets trained using a health care setting-specific corpus of manually annotated tweets first.
推特是一种微博服务,用户可以发送和阅读140字符的短消息,即“推文”。推特上有几条与医疗保健相关的无结构自由文本推文正在被分享,推特正成为医疗保健研究的热门领域。情感倾向是一种常用的衡量这些消息中积极或消极观点的指标。探索推特医疗保健研究中用于情感分析的方法,可能有助于我们更好地了解这一不断发展的领域中未来研究可用的选项。
本研究的首要目标是通过回顾该领域的现有研究及其使用的方法,了解推特医疗保健研究中可用于情感分析的工具。第二个目标是通过分析这些方法如何用于回答特定的医疗保健问题、其产出以及如何分析其准确性,来确定哪种方法在医疗保健环境中效果最佳。
对与推特和医疗保健研究相关的文献进行了综述,该研究对自由文本消息(推文)采用了定量情感分析方法。该研究比较了每个案例中使用的工具类型,并研究了工具制作、工具训练和准确性分析的方法。
共发现12篇研究医疗保健环境中情感倾向定量测量的论文。其中超过一半的研究专门为其研究制作了工具,4篇使用了免费的开源工具,2篇使用了商业软件。此外,12个工具中有4个是使用研究最终数据的较小样本进行训练的。情感分析方法平均针对总样本数据的0.45%(2816/627,024)进行训练。12篇论文中有1篇对所使用工具的准确性分析进行了评论。
在医疗保健环境中,推文的情感分析使用了多种方法。这些方法从自行制作的基本分类到更复杂、更昂贵的商业软件不等。开源和商业方法是基于产品评论和一般社交媒体消息开发的。这些方法中没有一种针对医疗保健消息语料库进行过广泛测试以检查其准确性。本研究表明,首先需要一个使用特定于医疗保健环境的人工标注推文语料库训练的、准确且经过测试的推文情感分析工具。