Birnbaum Michael L, Ernala Sindhu Kiranmai, Rizvi Asra F, De Choudhury Munmun, Kane John M
The Zucker Hillside Hospital, Northwell Health, Glen Oaks, NY, United States.
Feinstein Institute of Medical Research, Manhasset, NY, United States.
J Med Internet Res. 2017 Aug 14;19(8):e289. doi: 10.2196/jmir.7956.
Linguistic analysis of publicly available Twitter feeds have achieved success in differentiating individuals who self-disclose online as having schizophrenia from healthy controls. To date, limited efforts have included expert input to evaluate the authenticity of diagnostic self-disclosures.
This study aims to move from noisy self-reports of schizophrenia on social media to more accurate identification of diagnoses by exploring a human-machine partnered approach, wherein computational linguistic analysis of shared content is combined with clinical appraisals.
Twitter timeline data, extracted from 671 users with self-disclosed diagnoses of schizophrenia, was appraised for authenticity by expert clinicians. Data from disclosures deemed true were used to build a classifier aiming to distinguish users with schizophrenia from healthy controls. Results from the classifier were compared to expert appraisals on new, unseen Twitter users.
Significant linguistic differences were identified in the schizophrenia group including greater use of interpersonal pronouns (P<.001), decreased emphasis on friendship (P<.001), and greater emphasis on biological processes (P<.001). The resulting classifier distinguished users with disclosures of schizophrenia deemed genuine from control users with a mean accuracy of 88% using linguistic data alone. Compared to clinicians on new, unseen users, the classifier's precision, recall, and accuracy measures were 0.27, 0.77, and 0.59, respectively.
These data reinforce the need for ongoing collaborations integrating expertise from multiple fields to strengthen our ability to accurately identify and effectively engage individuals with mental illness online. These collaborations are crucial to overcome some of mental illnesses' biggest challenges by using digital technology.
对公开的推特动态进行语言分析,已成功区分出在网上自我披露患有精神分裂症的个体与健康对照者。迄今为止,仅有有限的研究纳入了专家意见来评估诊断性自我披露的真实性。
本研究旨在从社交媒体上嘈杂的精神分裂症自我报告转向更准确的诊断识别,通过探索人机合作方法,即将共享内容的计算语言分析与临床评估相结合。
从671名自我披露患有精神分裂症的用户提取的推特动态数据,由临床专家评估其真实性。将被认为真实的披露数据用于构建一个分类器,旨在区分患有精神分裂症的用户与健康对照者。将分类器的结果与对新的、未见过的推特用户的专家评估进行比较。
在精神分裂症组中发现了显著的语言差异,包括人际代词使用更多(P<0.001)、对友谊的强调减少(P<0.001)以及对生物过程的强调增加(P<0.001)。由此产生的分类器仅使用语言数据就能以88%的平均准确率区分出被认为是真实披露精神分裂症的用户与对照用户。与临床医生对新的、未见过的用户的评估相比,分类器的精确率、召回率和准确率分别为0.27、0.77和0.59。
这些数据强化了持续开展跨领域合作以增强我们在网上准确识别和有效接触精神疾病患者能力的必要性。这些合作对于利用数字技术克服精神疾病的一些最大挑战至关重要。