Department of Informatics, Donald Bren School of Information and Computer Science, University of California, Irvine, Irvine, California, USA.
Department of Learning Health Sciences, School of Medicine, University of Michigan, Ann Arbor, Michigan, USA.
J Am Med Inform Assoc. 2021 Jun 12;28(6):1125-1134. doi: 10.1093/jamia/ocaa298.
Sentiment analysis is a popular tool for analyzing health-related social media content. However, existing studies exhibit numerous methodological issues and inconsistencies with respect to research design and results reporting, which could lead to biased data, imprecise or incorrect conclusions, or incomparable results across studies. This article reports a systematic analysis of the literature with respect to such issues. The objective was to develop a standardized protocol for improving the research validity and comparability of results in future relevant studies.
We developed the Protocol of Analysis of senTiment in Health (PATH) based on a systematic review that analyzed common research design choices and how such choices were made, or reported, among eligible studies published 2010-2019.
Of 409 articles screened, 89 met the inclusion criteria. A total of 16 distinctive research design choices were identified, 9 of which have significant methodological or reporting inconsistencies among the articles reviewed, ranging from how relevance of study data was determined to how the sentiment analysis tool selected was validated. Based on this result, we developed the PATH protocol that encompasses all these distinctive design choices and highlights the ones for which careful consideration and detailed reporting are particularly warranted.
A substantial degree of methodological and reporting inconsistencies exist in the extant literature that applied sentiment analysis to analyzing health-related social media data. The PATH protocol developed through this research may contribute to mitigating such issues in future relevant studies.
情感分析是分析与健康相关的社交媒体内容的常用工具。然而,现有研究在研究设计和结果报告方面存在诸多方法学问题和不一致之处,这可能导致数据偏差、结论不准确或不正确,或研究之间的结果不可比。本文对这些问题进行了系统分析。目的是制定一个标准化的方案,以提高未来相关研究中结果的研究有效性和可比性。
我们基于 2010-2019 年发表的合格研究中常见的研究设计选择以及如何做出或报告这些选择的系统评价,制定了分析健康情感的方案(PATH)。
在筛选出的 409 篇文章中,有 89 篇符合纳入标准。共确定了 16 种独特的研究设计选择,其中 9 种在综述的文章中存在显著的方法学或报告不一致,范围从如何确定研究数据的相关性到如何验证选择的情感分析工具。基于这一结果,我们制定了 PATH 方案,涵盖了所有这些独特的设计选择,并强调了需要特别注意和详细报告的选择。
在应用情感分析分析与健康相关的社交媒体数据的现有文献中,存在相当程度的方法学和报告不一致。通过这项研究制定的 PATH 方案可能有助于缓解未来相关研究中的这些问题。