School of Communication and Mass Media, Northwest Missouri State University, Maryville, MO, United States.
Feinberg School of Medicine, Northwestern University, Chicago, IL, United States.
J Med Internet Res. 2023 Aug 22;25:e45589. doi: 10.2196/45589.
Smartphone-based apps are increasingly used to prevent relapse among those with substance use disorders (SUDs). These systems collect a wealth of data from participants, including the content of messages exchanged in peer-to-peer support forums. How individuals self-disclose and exchange social support in these forums may provide insight into their recovery course, but a manual review of a large corpus of text by human coders is inefficient.
The study sought to evaluate the feasibility of applying supervised machine learning (ML) to perform large-scale content analysis of an online peer-to-peer discussion forum. Machine-coded data were also used to understand how communication styles relate to writers' substance use and well-being outcomes.
Data were collected from a smartphone app that connects patients with SUDs to online peer support via a discussion forum. Overall, 268 adult patients with SUD diagnoses were recruited from 3 federally qualified health centers in the United States beginning in 2014. Two waves of survey data were collected to measure demographic characteristics and study outcomes: at baseline (before accessing the app) and after 6 months of using the app. Messages were downloaded from the peer-to-peer forum and subjected to manual content analysis. These data were used to train supervised ML algorithms using features extracted from the Linguistic Inquiry and Word Count (LIWC) system to automatically identify the types of expression relevant to peer-to-peer support. Regression analyses examined how each expression type was associated with recovery outcomes.
Our manual content analysis identified 7 expression types relevant to the recovery process (emotional support, informational support, negative affect, change talk, insightful disclosure, gratitude, and universality disclosure). Over 6 months of app use, 86.2% (231/268) of participants posted on the app's support forum. Of these participants, 93.5% (216/231) posted at least 1 message in the content categories of interest, generating 10,503 messages. Supervised ML algorithms were trained on the hand-coded data, achieving F-scores ranging from 0.57 to 0.85. Regression analyses revealed that a greater proportion of the messages giving emotional support to peers was related to reduced substance use. For self-disclosure, a greater proportion of the messages expressing universality was related to improved quality of life, whereas a greater proportion of the negative affect expressions was negatively related to quality of life and mood.
This study highlights a method of natural language processing with potential to provide real-time insights into peer-to-peer communication dynamics. First, we found that our ML approach allowed for large-scale content coding while retaining moderate-to-high levels of accuracy. Second, individuals' expression styles were associated with recovery outcomes. The expression types of emotional support, universality disclosure, and negative affect were significantly related to recovery outcomes, and attending to these dynamics may be important for appropriate intervention.
智能手机应用程序越来越多地被用于预防有物质使用障碍(SUD)的人的复发。这些系统从参与者那里收集了大量数据,包括在对等支持论坛中交换的消息的内容。个人在这些论坛中如何自我披露和交换社会支持可能会深入了解他们的康复过程,但人工编码员对大量文本进行手动审查效率低下。
本研究旨在评估应用监督机器学习(ML)对在线对等讨论论坛进行大规模内容分析的可行性。机器编码数据还用于了解沟通方式如何与作者的物质使用和幸福感结果相关。
数据来自一款智能手机应用程序,该应用程序通过一个讨论论坛将 SUD 患者与在线对等支持联系起来。总体而言,从美国的 3 个联邦合格的健康中心招募了 268 名有 SUD 诊断的成年患者,始于 2014 年。在使用该应用程序 6 个月后,收集了两波调查数据来测量人口统计学特征和研究结果:基线(在访问应用程序之前)和使用应用程序 6 个月后。从对等论坛下载消息并进行手动内容分析。这些数据用于使用从语言探究和词频(LIWC)系统中提取的特征训练监督 ML 算法,以自动识别与对等支持相关的表达类型。回归分析检查了每种表达类型与康复结果的关联。
我们的手动内容分析确定了 7 种与康复过程相关的表达类型(情感支持、信息支持、负面情绪、改变谈话、有洞察力的披露、感激和普遍性披露)。在应用程序使用的 6 个月期间,86.2%(231/268)的参与者在应用程序的支持论坛上发布了消息。在这些参与者中,93.5%(216/231)在感兴趣的内容类别中至少发布了 1 条消息,生成了 10503 条消息。监督 ML 算法在手动编码数据上进行了训练,实现了 0.57 到 0.85 的 F 分数。回归分析表明,向同伴提供更多情感支持的消息比例与减少物质使用有关。对于自我披露,表达普遍性的消息比例与提高生活质量有关,而表达负面情绪的消息比例与生活质量和情绪呈负相关。
本研究强调了一种自然语言处理方法,具有提供对等通信动态实时洞察的潜力。首先,我们发现我们的 ML 方法允许进行大规模内容编码,同时保持中等至高的准确性。其次,个人的表达风格与康复结果相关。情感支持、普遍性披露和负面情绪的表达类型与康复结果显著相关,关注这些动态可能对适当的干预措施很重要。