Klein Ari Z, Magge Arjun, O'Connor Karen, Gonzalez-Hernandez Graciela
Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States.
Department of Computational Biomedicine, Cedars-Sinai Medical Center, Los Angeles, CA, United States.
JMIR Aging. 2022 Sep 16;5(3):e39547. doi: 10.2196/39547.
More than 6 million people in the United States have Alzheimer disease and related dementias, receiving help from more than 11 million family or other informal caregivers. A range of traditional interventions has been developed to support family caregivers; however, most of them have not been implemented in practice and remain largely inaccessible. While recent studies have shown that family caregivers of people with dementia use Twitter to discuss their experiences, methods have not been developed to enable the use of Twitter for interventions.
The objective of this study is to develop an annotated data set and benchmark classification models for automatically identifying a cohort of Twitter users who have a family member with dementia.
Between May 4 and May 20, 2021, we collected 10,733 tweets, posted by 8846 users, that mention a dementia-related keyword, a linguistic marker that potentially indicates a diagnosis, and a select familial relationship. Three annotators annotated 1 random tweet per user to distinguish those that indicate having a family member with dementia from those that do not. Interannotator agreement was 0.82 (Fleiss kappa). We used the annotated tweets to train and evaluate support vector machine and deep neural network classifiers. To assess the scalability of our approach, we then deployed automatic classification on unlabeled tweets that were continuously collected between May 4, 2021, and March 9, 2022.
A deep neural network classifier based on a BERT (bidirectional encoder representations from transformers) model pretrained on tweets achieved the highest F-score of 0.962 (precision=0.946 and recall=0.979) for the class of tweets indicating that the user has a family member with dementia. The classifier detected 128,838 tweets that indicate having a family member with dementia, posted by 74,290 users between May 4, 2021, and March 9, 2022-that is, approximately 7500 users per month.
Our annotated data set can be used to automatically identify Twitter users who have a family member with dementia, enabling the use of Twitter on a large scale to not only explore family caregivers' experiences but also directly target interventions at these users.
在美国,超过600万人患有阿尔茨海默病及相关痴呆症,他们得到了超过1100万家庭或其他非正式护理人员的帮助。已经开发了一系列传统干预措施来支持家庭护理人员;然而,其中大多数措施尚未在实践中实施,而且在很大程度上难以获得。虽然最近的研究表明,痴呆症患者的家庭护理人员会使用推特来讨论他们的经历,但尚未开发出利用推特进行干预的方法。
本研究的目的是开发一个带注释的数据集和基准分类模型,用于自动识别有痴呆症家庭成员的推特用户群体。
在2021年5月4日至5月20日期间,我们收集了8846名用户发布的10733条推文,这些推文提到了一个与痴呆症相关的关键词、一个可能表明诊断的语言标记以及一种特定的家庭关系。三名注释者为每个用户注释一条随机推文,以区分那些表明有痴呆症家庭成员的推文和没有的推文。注释者间一致性为0.82(弗莱iss卡帕)。我们使用带注释的推文来训练和评估支持向量机和深度神经网络分类器。为了评估我们方法的可扩展性,我们随后对2021年5月4日至2022年3月9日期间持续收集的未标记推文进行了自动分类。
基于在推文中预训练的BERT(来自变换器的双向编码器表示)模型的深度神经网络分类器,对于表明用户有痴呆症家庭成员的推文类别,获得了最高F值0.962(精确率=0.946,召回率=0.979)。该分类器检测到2021年5月4日至2022年3月9日期间由74290名用户发布的128838条表明有痴呆症家庭成员的推文——即每月约7500名用户。
我们的带注释数据集可用于自动识别有痴呆症家庭成员的推特用户,从而能够大规模利用推特,不仅可以探索家庭护理人员的经历,还可以直接针对这些用户进行干预。