评估人工智能聊天机器人对癌症热门搜索查询的响应

Assessment of Artificial Intelligence Chatbot Responses to Top Searched Queries About Cancer.

机构信息

Department of Urology, State University of New York Downstate Health Sciences University, New York.

Department of Urology, New York University School of Medicine, New York.

出版信息

JAMA Oncol. 2023 Oct 1;9(10):1437-1440. doi: 10.1001/jamaoncol.2023.2947.

DOI:10.1001/jamaoncol.2023.2947

PMID:37615960

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10450581/

Abstract

IMPORTANCE

Consumers are increasingly using artificial intelligence (AI) chatbots as a source of information. However, the quality of the cancer information generated by these chatbots has not yet been evaluated using validated instruments.

OBJECTIVE

To characterize the quality of information and presence of misinformation about skin, lung, breast, colorectal, and prostate cancers generated by 4 AI chatbots.

DESIGN, SETTING, AND PARTICIPANTS: This cross-sectional study assessed AI chatbots' text responses to the 5 most commonly searched queries related to the 5 most common cancers using validated instruments. Search data were extracted from the publicly available Google Trends platform and identical prompts were used to generate responses from 4 AI chatbots: ChatGPT version 3.5 (OpenAI), Perplexity (Perplexity.AI), Chatsonic (Writesonic), and Bing AI (Microsoft).

EXPOSURES

Google Trends' top 5 search queries related to skin, lung, breast, colorectal, and prostate cancer from January 1, 2021, to January 1, 2023, were input into 4 AI chatbots.

MAIN OUTCOMES AND MEASURES

The primary outcomes were the quality of consumer health information based on the validated DISCERN instrument (scores from 1 [low] to 5 [high] for quality of information) and the understandability and actionability of this information based on the understandability and actionability domains of the Patient Education Materials Assessment Tool (PEMAT) (scores of 0%-100%, with higher scores indicating a higher level of understandability and actionability). Secondary outcomes included misinformation scored using a 5-item Likert scale (scores from 1 [no misinformation] to 5 [high misinformation]) and readability assessed using the Flesch-Kincaid Grade Level readability score.

RESULTS

The analysis included 100 responses from 4 chatbots about the 5 most common search queries for skin, lung, breast, colorectal, and prostate cancer. The quality of text responses generated by the 4 AI chatbots was good (median [range] DISCERN score, 5 [2-5]) and no misinformation was identified. Understandability was moderate (median [range] PEMAT Understandability score, 66.7% [33.3%-90.1%]), and actionability was poor (median [range] PEMAT Actionability score, 20.0% [0%-40.0%]). The responses were written at the college level based on the Flesch-Kincaid Grade Level score.

CONCLUSIONS AND RELEVANCE

Findings of this cross-sectional study suggest that AI chatbots generally produce accurate information for the top cancer-related search queries, but the responses are not readily actionable and are written at a college reading level. These limitations suggest that AI chatbots should be used supplementarily and not as a primary source for medical information.

摘要

重要性

消费者越来越多地将人工智能 (AI) 聊天机器人作为信息来源。然而，这些聊天机器人生成的癌症信息的质量尚未使用经过验证的工具进行评估。

目的

使用经过验证的工具，描述 4 种 AI 聊天机器人生成的有关皮肤癌、肺癌、乳腺癌、结直肠癌和前列腺癌的信息的质量和存在错误信息的情况。

设计、设置和参与者：这项横断面研究使用经过验证的工具评估了 AI 聊天机器人对与 5 种最常见癌症相关的 5 种最常见查询的文本回复。从 2021 年 1 月 1 日至 2023 年 1 月 1 日，从公共可用的 Google Trends 平台提取搜索数据，并使用相同的提示从 4 种 AI 聊天机器人生成回复：ChatGPT 版本 3.5（OpenAI）、Perplexity（Perplexity.AI）、Chatsonic（Writesonic）和 Bing AI（Microsoft）。

暴露

将 2021 年 1 月 1 日至 2023 年 1 月 1 日期间 Google Trends 上有关皮肤、肺、乳房、结直肠和前列腺癌的前 5 大搜索查询输入到 4 个 AI 聊天机器人中。

主要结果和措施

主要结果是基于经过验证的 DISCERN 工具（信息质量得分为 1[低]至 5[高]）的消费者健康信息质量，以及基于患者教育材料评估工具（PEMAT）的可理解性和可操作性领域（理解度得分为 0%-100%，得分越高表示理解度和可操作性越高）的信息可理解性和可操作性。次要结果包括使用 5 项李克特量表评估的错误信息（得分从 1[无错误信息]到 5[高错误信息]）和使用弗莱什-金凯德年级水平可读性评分评估的可读性。

结果

分析包括 4 个 AI 聊天机器人对 5 种最常见皮肤、肺、乳房、结直肠和前列腺癌搜索查询的 100 次回复。4 个 AI 聊天机器人生成的文本回复质量良好（中位数[范围]DISCERN 得分，5[2-5]），未发现错误信息。可理解性为中等（中位数[范围]PEMAT 可理解性得分，66.7%[33.3%-90.1%]），可操作性较差（中位数[范围]PEMAT 可操作性得分，20.0%[0%-40.0%]）。根据弗莱什-金凯德年级水平评分，回复的写作水平为大学水平。

结论和相关性

这项横断面研究的结果表明，AI 聊天机器人通常可以为顶级癌症相关搜索查询生成准确的信息，但回复不易操作，且写作水平为大学阅读水平。这些局限性表明，AI 聊天机器人应作为辅助工具使用，而不是作为医疗信息的主要来源。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

评估人工智能聊天机器人对癌症热门搜索查询的响应

Assessment of Artificial Intelligence Chatbot Responses to Top Searched Queries About Cancer.

机构信息

出版信息

IMPORTANCE

OBJECTIVE

EXPOSURES

MAIN OUTCOMES AND MEASURES

RESULTS

CONCLUSIONS AND RELEVANCE

重要性

目的

暴露

主要结果和措施

结果

结论和相关性

相似文献

引用本文的文献

评估人工智能聊天机器人对癌症热门搜索查询的响应

Assessment of Artificial Intelligence Chatbot Responses to Top Searched Queries About Cancer.

机构信息

出版信息

IMPORTANCE

OBJECTIVE

EXPOSURES

MAIN OUTCOMES AND MEASURES

RESULTS

CONCLUSIONS AND RELEVANCE

重要性

目的

暴露

主要结果和措施

结果

结论和相关性

相似文献

引用本文的文献