Suppr超能文献

在推特上使用#ActuallyAutistic进行自闭症谱系障碍的精准诊断:机器学习研究

Using #ActuallyAutistic on Twitter for Precision Diagnosis of Autism Spectrum Disorder: Machine Learning Study.

作者信息

Jaiswal Aditi, Washington Peter

机构信息

Department of Information and Computer Sciences, University of Hawaii at Manoa, Honolulu, HI, United States.

出版信息

JMIR Form Res. 2024 Feb 14;8:e52660. doi: 10.2196/52660.

Abstract

BACKGROUND

The increasing use of social media platforms has given rise to an unprecedented surge in user-generated content, with millions of individuals publicly sharing their thoughts, experiences, and health-related information. Social media can serve as a useful means to study and understand public health. Twitter (subsequently rebranded as "X") is one such social media platform that has proven to be a valuable source of rich information for both the general public and health officials. We conducted the first study applying Twitter data mining to autism screening.

OBJECTIVE

This study used Twitter as the primary source of data to study the behavioral characteristics and real-time emotional projections of individuals identifying with autism spectrum disorder (ASD). We aimed to improve the rigor of ASD analytics research by using the digital footprint of an individual to study the linguistic patterns of individuals with ASD.

METHODS

We developed a machine learning model to distinguish individuals with autism from their neurotypical peers based on the textual patterns from their public communications on Twitter. We collected 6,515,470 tweets from users' self-identification with autism using "#ActuallyAutistic" and a separate control group to identify linguistic markers associated with ASD traits. To construct the data set, we targeted English-language tweets using the search query "#ActuallyAutistic" posted from January 1, 2014, to December 31, 2022. From these tweets, we identified unique users who used keywords such as "autism" OR "autistic" OR "neurodiverse" in their profile description and collected all the tweets from their timeline. To build the control group data set, we formulated a search query excluding the hashtag, "-#ActuallyAutistic," and collected 1000 tweets per day during the same time period. We trained a word2vec model and an attention-based, bidirectional long short-term memory model to validate the performance of per-tweet and per-profile classification models. We also illustrate the utility of the data set through common natural language processing tasks such as sentiment analysis and topic modeling.

RESULTS

Our tweet classifier reached a 73% accuracy, a 0.728 area under the receiver operating characteristic curve score, and an 0.71 F-score using word2vec representations fed into a logistic regression model, while the user profile classifier achieved an 0.78 area under the receiver operating characteristic curve score and an F-score of 0.805 using an attention-based, bidirectional long short-term memory model. This is a promising start, demonstrating the potential for effective digital phenotyping studies and large-scale intervention using text data mined from social media.

CONCLUSIONS

Textual differences in social media communications can help researchers and clinicians conduct symptomatology studies in natural settings.

摘要

背景

社交媒体平台使用的日益增加导致用户生成内容前所未有的激增,数百万人公开分享他们的想法、经历和与健康相关的信息。社交媒体可作为研究和理解公共卫生的有用手段。推特(后更名为“X”)就是这样一个社交媒体平台,已被证明是公众和卫生官员丰富信息的宝贵来源。我们开展了第一项将推特数据挖掘应用于自闭症筛查的研究。

目的

本研究将推特作为主要数据来源,以研究认同自闭症谱系障碍(ASD)的个体的行为特征和实时情绪投射。我们旨在通过利用个体的数字足迹来研究ASD个体的语言模式,提高ASD分析研究的严谨性。

方法

我们开发了一种机器学习模型,根据推特上公共交流的文本模式,将自闭症个体与其神经典型同龄人区分开来。我们使用“#ActuallyAutistic”从用户对自闭症的自我认同中收集了6515470条推文,并设立了一个单独的对照组,以识别与ASD特征相关的语言标记。为构建数据集,我们使用从2014年1月1日至2022年12月31日发布的搜索查询“#ActuallyAutistic”来定位英语推文。从这些推文中,我们识别出在个人资料描述中使用了“自闭症”“孤独症”或“神经多样性”等关键词的独特用户,并收集了他们时间轴上的所有推文。为构建对照组数据集,我们制定了一个排除该主题标签的搜索查询“-#ActuallyAutistic”,并在同一时间段内每天收集1000条推文。我们训练了一个词向量模型和一个基于注意力的双向长短期记忆模型,以验证每条推文和每个个人资料分类模型的性能。我们还通过情感分析和主题建模等常见的自然语言处理任务来说明数据集的效用。

结果

我们的推文分类器在将词向量表示输入逻辑回归模型时,准确率达到73%,受试者工作特征曲线下面积得分为0.728,F值为0.71;而用户资料分类器在使用基于注意力的双向长短期记忆模型时,受试者工作特征曲线下面积得分为0.78,F值为0.805。这是一个很有前景的开端,证明了利用从社交媒体挖掘的文本数据进行有效数字表型研究和大规模干预的潜力。

结论

社交媒体交流中的文本差异可帮助研究人员和临床医生在自然环境中开展症状学研究。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/41f5/10902768/91f68e6721c2/formative_v8i1e52660_fig1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验