Brown Andrew, Kumar Ash Tanuj, Melamed Osnat, Ahmed Imtihan, Wang Yu Hao, Deza Arnaud, Morcos Marc, Zhu Leon, Maslej Marta, Minian Nadia, Sujaya Vidya, Wolff Jodi, Doggett Olivia, Iantorno Mathew, Ratto Matt, Selby Peter, Rose Jonathan
The Edward S Rogers Sr Department of Electrical & Computer Engineering, University of Toronto, Toronto, ON, Canada.
INTREPID Lab, Centre for Addiction and Mental Health, Toronto, ON, Canada.
JMIR Ment Health. 2023 Oct 17;10:e49132. doi: 10.2196/49132.
The motivational interviewing (MI) approach has been shown to help move ambivalent smokers toward the decision to quit smoking. There have been several attempts to broaden access to MI through text-based chatbots. These typically use scripted responses to client statements, but such nonspecific responses have been shown to reduce effectiveness. Recent advances in natural language processing provide a new way to create responses that are specific to a client's statements, using a generative language model.
This study aimed to design, evolve, and measure the effectiveness of a chatbot system that can guide ambivalent people who smoke toward the decision to quit smoking with MI-style generative reflections.
Over time, 4 different MI chatbot versions were evolved, and each version was tested with a separate group of ambivalent smokers. A total of 349 smokers were recruited through a web-based recruitment platform. The first chatbot version only asked questions without reflections on the answers. The second version asked the questions and provided reflections with an initial version of the reflection generator. The third version used an improved reflection generator, and the fourth version added extended interaction on some of the questions. Participants' readiness to quit was measured before the conversation and 1 week later using an 11-point scale that measured 3 attributes related to smoking cessation: readiness, confidence, and importance. The number of quit attempts made in the week before the conversation and the week after was surveyed; in addition, participants rated the perceived empathy of the chatbot. The main body of the conversation consists of 5 scripted questions, responses from participants, and (for 3 of the 4 versions) generated reflections. A pretrained transformer-based neural network was fine-tuned on examples of high-quality reflections to generate MI reflections.
The increase in average confidence using the nongenerative version was 1.0 (SD 2.0; P=.001), whereas for the 3 generative versions, the increases ranged from 1.2 to 1.3 (SD 2.0-2.3; P<.001). The extended conversation with improved generative reflections was the only version associated with a significant increase in average importance (0.7, SD 2.0; P<.001) and readiness (0.4, SD 1.7; P=.01). The enhanced reflection and extended conversations exhibited significantly better perceived empathy than the nongenerative conversation (P=.02 and P=.004, respectively). The number of quit attempts did not significantly change between the week before the conversation and the week after across all 4 conversations.
The results suggest that generative reflections increase the impact of a conversation on readiness to quit smoking 1 week later, although a significant portion of the impact seen so far can be achieved by only asking questions without the reflections. These results support further evolution of the chatbot conversation and can serve as a basis for comparison against more advanced versions.
动机性访谈(MI)方法已被证明有助于促使矛盾心态的吸烟者做出戒烟决定。人们曾多次尝试通过基于文本的聊天机器人扩大MI的使用范围。这些聊天机器人通常对客户的陈述使用预设好的回复,但这种非特定的回复已被证明会降低有效性。自然语言处理的最新进展提供了一种新方法,即使用生成式语言模型来创建针对客户陈述的特定回复。
本研究旨在设计、改进并评估一个聊天机器人系统的有效性,该系统能够运用MI式的生成性回应,引导有矛盾心态的吸烟者做出戒烟决定。
随着时间推移,开发了4个不同版本的MI聊天机器人,每个版本都在一组不同的矛盾心态吸烟者中进行测试。通过一个基于网络的招募平台共招募了349名吸烟者。第一个聊天机器人版本只提问,不对答案进行回应。第二个版本提问并使用初始版本的回应生成器提供回应。第三个版本使用了改进后的回应生成器,第四个版本在一些问题上增加了扩展互动。在对话前和1周后,使用一个11分制量表测量参与者的戒烟意愿,该量表测量与戒烟相关的3个属性:意愿、信心和重要性。调查了对话前一周和对话后一周内的戒烟尝试次数;此外,参与者对聊天机器人的感知同理心进行评分。对话主体由5个预设问题、参与者的回答以及(4个版本中的3个)生成的回应组成。一个基于预训练变压器的神经网络在高质量回应示例上进行微调,以生成MI回应。
使用非生成版本时,平均信心增加了1.0(标准差2.0;P = 0.001),而对于3个生成版本,增加幅度在1.2至1.3之间(标准差2.0 - 2.3;P < 0.001)。具有改进后的生成性回应的扩展对话是唯一一个与平均重要性(0.7,标准差2.0;P < 0.001)和意愿(0.4,标准差1.7;P = 0.01)显著增加相关的版本。增强后的回应和扩展对话在感知同理心方面比非生成对话表现得明显更好(分别为P = 0.02和P = 0.004)。在所有4次对话中,对话前一周和对话后一周内的戒烟尝试次数没有显著变化。
结果表明,生成性回应增加了对话对1周后戒烟意愿的影响,尽管到目前为止,仅通过提问而不进行回应也能实现很大一部分影响。这些结果支持聊天机器人对话的进一步改进,并可作为与更先进版本进行比较的基础。