Delanys Sarah, Benamara Farah, Moriceau Véronique, Olivier François, Mothe Josiane
Fédération Régionale de Recherche en Psychiatrie et santé mentale d'Occitanie, Toulouse, France.
Centre Hospitalier de Montauban, Montauban, France.
JMIR Form Res. 2022 Feb 14;6(2):e18539. doi: 10.2196/18539.
With the advent of digital technology and specifically user-generated contents in social media, new ways emerged for studying possible stigma of people in relation with mental health. Several pieces of work studied the discourse conveyed about psychiatric pathologies on Twitter considering mostly tweets in English and a limited number of psychiatric disorders terms. This paper proposes the first study to analyze the use of a wide range of psychiatric terms in tweets in French.
Our aim is to study how generic, nosographic, and therapeutic psychiatric terms are used on Twitter in French. More specifically, our study has 3 complementary goals: (1) to analyze the types of psychiatric word use (medical, misuse, or irrelevant), (2) to analyze the polarity conveyed in the tweets that use these terms (positive, negative, or neural), and (3) to compare the frequency of these terms to those observed in related work (mainly in English).
Our study was conducted on a corpus of tweets in French posted from January 1, 2016, to December 31, 2018, and collected using dedicated keywords. The corpus was manually annotated by clinical psychiatrists following a multilayer annotation scheme that includes the type of word use and the opinion orientation of the tweet. A qualitative analysis was performed to measure the reliability of the produced manual annotation, and then a quantitative analysis was performed considering mainly term frequency in each layer and exploring the interactions between them.
One of the first results is a resource as an annotated dataset. The initial dataset is composed of 22,579 tweets in French containing at least one of the selected psychiatric terms. From this set, experts in psychiatry randomly annotated 3040 tweets that corresponded to the resource resulting from our work. The second result is the analysis of the annotations showing that terms are misused in 45.33% (1378/3040) of the tweets and that their associated polarity is negative in 86.21% (1188/1378) of the cases. When considering the 3 types of term use, 52.14% (1585/3040) of the tweets are associated with a negative polarity. Misused terms related to psychotic disorders (721/1300, 55.46%) were more frequent to those related to depression (15/280, 5.4%).
Some psychiatric terms are misused in the corpora we studied, which is consistent with the results reported in related work in other languages. Thanks to the great diversity of studied terms, this work highlighted a disparity in the representations and ways of using psychiatric terms. Moreover, our study is important to help psychiatrists to be aware of the term use in new communication media such as social networks that are widely used. This study has the huge advantage to be reproducible thanks to the framework and guidelines we produced so that the study could be renewed in order to analyze the evolution of term usage. While the newly build dataset is a valuable resource for other analytical studies, it could also serve to train machine learning algorithms to automatically identify stigma in social media.
随着数字技术的出现,特别是社交媒体中用户生成内容的出现,出现了研究与心理健康相关的人群可能存在的污名化的新方法。几项研究探讨了推特上关于精神疾病的话语,主要考虑的是英文推文和有限数量的精神疾病术语。本文提出了第一项研究,以分析法语推文中广泛使用的精神科术语。
我们的目的是研究通用、疾病分类和治疗性精神科术语在法语推特上的使用情况。更具体地说,我们的研究有3个互补目标:(1)分析精神科词汇的使用类型(医学使用、误用或不相关),(2)分析使用这些术语的推文中传达的极性(积极、消极或中性),(3)将这些术语的频率与相关研究(主要是英文研究)中观察到的频率进行比较。
我们的研究基于2016年1月1日至2018年12月31日发布的法语推文语料库,使用专用关键词进行收集。该语料库由临床精神科医生按照多层注释方案进行人工注释,该方案包括词汇使用类型和推文的观点倾向。进行了定性分析以衡量所产生的人工注释的可靠性,然后进行了定量分析,主要考虑各层中的词频并探索它们之间的相互作用。
第一个结果是作为一个带注释的数据集的资源。初始数据集由22579条包含至少一个所选精神科术语的法语推文组成。从这个集合中,精神科专家随机注释了3040条推文,这些推文构成了我们工作的成果资源。第二个结果是对注释的分析,表明在45.33%(1378/3040)的推文中术语被误用,并且在86.21%(1188/1378)的情况下其相关极性为负。在考虑三种词汇使用类型时,52.14%(1585/3040)的推文与负极性相关。与精神分裂症相关的误用术语(共721/1300,占55.46%)比与抑郁症相关的误用术语(15/280,占5.4%)更常见。
在我们研究的语料库中,一些精神科术语被误用,这与其他语言的相关研究报告的结果一致。由于所研究术语的多样性,这项工作突出了精神科术语的表述和使用方式上的差异。此外,我们的研究对于帮助精神科医生了解在诸如社交网络等广泛使用的新通信媒体中的术语使用非常重要。由于我们制定框架和指南,这项研究具有可重复性的巨大优势,以便可以更新研究以分析术语使用的演变。虽然新构建的数据集是其他分析研究的宝贵资源,但它也可用于训练机器学习算法以自动识别社交媒体中的污名化现象。