Kocaballi Ahmet Baki, Quiroz Juan C, Rezazadegan Dana, Berkovsky Shlomo, Magrabi Farah, Coiera Enrico, Laranjo Liliana
Australian Institute of Health Innovation , Macquarie University, Sydney, Australia.
NOVA National School of Public Health, Public Health Research Centre, Universidade NOVA de Lisboa, Lisbon, Portugal.
J Med Internet Res. 2020 Feb 9;22(2):e15823. doi: 10.2196/15823.
Conversational agents (CAs) are systems that mimic human conversations using text or spoken language. Their widely used examples include voice-activated systems such as Apple Siri, Google Assistant, Amazon Alexa, and Microsoft Cortana. The use of CAs in health care has been on the rise, but concerns about their potential safety risks often remain understudied.
This study aimed to analyze how commonly available, general-purpose CAs on smartphones and smart speakers respond to health and lifestyle prompts (questions and open-ended statements) by examining their responses in terms of content and structure alike.
We followed a piloted script to present health- and lifestyle-related prompts to 8 CAs. The CAs' responses were assessed for their appropriateness on the basis of the prompt type: responses to safety-critical prompts were deemed appropriate if they included a referral to a health professional or service, whereas responses to lifestyle prompts were deemed appropriate if they provided relevant information to address the problem prompted. The response structure was also examined according to information sources (Web search-based or precoded), response content style (informative and/or directive), confirmation of prompt recognition, and empathy.
The 8 studied CAs provided in total 240 responses to 30 prompts. They collectively responded appropriately to 41% (46/112) of the safety-critical and 39% (37/96) of the lifestyle prompts. The ratio of appropriate responses deteriorated when safety-critical prompts were rephrased or when the agent used a voice-only interface. The appropriate responses included mostly directive content and empathy statements for the safety-critical prompts and a mix of informative and directive content for the lifestyle prompts.
Our results suggest that the commonly available, general-purpose CAs on smartphones and smart speakers with unconstrained natural language interfaces are limited in their ability to advise on both the safety-critical health prompts and lifestyle prompts. Our study also identified some response structures the CAs employed to present their appropriate responses. Further investigation is needed to establish guidelines for designing suitable response structures for different prompt types.
对话代理(CAs)是使用文本或口语模仿人类对话的系统。其广泛应用的例子包括语音激活系统,如苹果Siri、谷歌助手、亚马逊Alexa和微软小娜。对话代理在医疗保健领域的应用一直在增加,但对其潜在安全风险的担忧往往研究不足。
本研究旨在通过从内容和结构两方面检查智能手机和智能音箱上常见的通用对话代理对健康和生活方式提示(问题和开放式陈述)的回应,分析它们的回应情况。
我们按照一个试点脚本,向8个对话代理提出与健康和生活方式相关的提示。根据提示类型评估对话代理的回应是否恰当:对关键安全提示的回应若包含转介给医疗专业人员或服务,则被视为恰当;对生活方式提示的回应若提供了相关信息以解决所提示的问题,则被视为恰当。还根据信息来源(基于网络搜索或预编码)、回应内容风格(信息性和/或指导性)、对提示识别的确认以及同理心来检查回应结构。
8个被研究的对话代理对30个提示总共提供了240个回应。它们对41%(46/112)的关键安全提示和39%(37/96)的生活方式提示做出了恰当回应。当关键安全提示被重新表述或代理使用仅语音界面时,恰当回应的比例会下降。恰当回应大多包括针对关键安全提示的指导性内容和同理心陈述,以及针对生活方式提示的信息性和指导性内容的混合。
我们的结果表明,智能手机和智能音箱上具有无约束自然语言界面的常见通用对话代理在为关键安全健康提示和生活方式提示提供建议方面能力有限。我们的研究还识别了对话代理用于给出恰当回应的一些回应结构。需要进一步研究以建立针对不同提示类型设计合适回应结构的指导原则。