• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

以用户为中心的人工智能:通过应用商店中的用户评论评估生成式人工智能应用的可用性。

User-centric AI: evaluating the usability of generative AI applications through user reviews on app stores.

作者信息

Alabduljabbar Reham

机构信息

Information Technology Department, College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia.

出版信息

PeerJ Comput Sci. 2024 Oct 25;10:e2421. doi: 10.7717/peerj-cs.2421. eCollection 2024.

DOI:10.7717/peerj-cs.2421
PMID:39650468
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11623163/
Abstract

This article presents a usability evaluation and comparison of generative AI applications through the analysis of user reviews from popular digital marketplaces, specifically Apple's App Store and Google Play. The study aims to bridge the research gap in real-world usability assessments of generative AI tools. A total of 11,549 reviews were extracted and analyzed from January to March 2024 for five generative AI apps: ChatGPT, Bing AI, Microsoft Copilot, Gemini AI, and Da Vinci AI. The dataset has been made publicly available, allowing for further analysis by other researchers. The evaluation follows ISO 9241 usability standards, focusing on effectiveness, efficiency, and user satisfaction. This study is believed to be the first usability evaluation for generative AI applications using user reviews across digital marketplaces. The results show that ChatGPT achieved the highest compound usability scores among Android and iOS users, with scores of 0.504 and 0.462, respectively. Conversely, Gemini AI scored the lowest among Android apps at 0.016, and Da Vinci AI had the lowest among iOS apps at 0.275. Satisfaction scores were critical in usability assessments, with ChatGPT obtaining the highest rates of 0.590 for Android and 0.565 for iOS, while Gemini AI had the lowest satisfaction rate at -0.138 for Android users. The findings revealed usability issues related to ease of use, functionality, and reliability in generative AI tools, providing valuable insights from user opinions and feedback. Based on the analysis, actionable recommendations were proposed to enhance the usability of generative AI tools, aiming to address identified usability issues and improve the overall user experience. This study contributes to a deeper understanding of user experiences and offers valuable guidance for enhancing the usability of generative AI applications.

摘要

本文通过分析来自热门数字市场(具体为苹果应用商店和谷歌应用商店)的用户评论,对生成式人工智能应用程序进行了可用性评估和比较。该研究旨在弥合生成式人工智能工具在实际可用性评估方面的研究差距。2024年1月至3月,共提取并分析了11549条针对五款生成式人工智能应用程序的评论:ChatGPT、必应人工智能、微软Copilot、Gemini人工智能和达芬奇人工智能。该数据集已公开提供,供其他研究人员进一步分析。评估遵循ISO 9241可用性标准,重点关注有效性、效率和用户满意度。据信,这项研究是首次利用跨数字市场的用户评论对生成式人工智能应用程序进行可用性评估。结果显示,ChatGPT在安卓和iOS用户中获得了最高的综合可用性得分,分别为0.504和0.462。相反,Gemini人工智能在安卓应用中得分最低,为0.016,而达芬奇人工智能在iOS应用中得分最低,为0.275。满意度得分在可用性评估中至关重要,ChatGPT在安卓系统中的满意度得分最高,为0.590,在iOS系统中为0.565,而Gemini人工智能在安卓用户中的满意度得分最低,为-0.138。研究结果揭示了生成式人工智能工具在易用性、功能和可靠性方面的可用性问题,从用户意见和反馈中提供了有价值的见解。基于分析,提出了可操作的建议,以提高生成式人工智能工具的可用性,旨在解决已识别的可用性问题并改善整体用户体验。这项研究有助于更深入地理解用户体验,并为提高生成式人工智能应用程序的可用性提供有价值的指导。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3290/11623163/29077c816dec/peerj-cs-10-2421-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3290/11623163/833f419d4185/peerj-cs-10-2421-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3290/11623163/7fb76e6e23c6/peerj-cs-10-2421-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3290/11623163/ee5c41752b49/peerj-cs-10-2421-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3290/11623163/da91634162b8/peerj-cs-10-2421-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3290/11623163/3af3a9de1e2b/peerj-cs-10-2421-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3290/11623163/29077c816dec/peerj-cs-10-2421-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3290/11623163/833f419d4185/peerj-cs-10-2421-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3290/11623163/7fb76e6e23c6/peerj-cs-10-2421-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3290/11623163/ee5c41752b49/peerj-cs-10-2421-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3290/11623163/da91634162b8/peerj-cs-10-2421-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3290/11623163/3af3a9de1e2b/peerj-cs-10-2421-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3290/11623163/29077c816dec/peerj-cs-10-2421-g006.jpg

相似文献

1
User-centric AI: evaluating the usability of generative AI applications through user reviews on app stores.以用户为中心的人工智能:通过应用商店中的用户评论评估生成式人工智能应用的可用性。
PeerJ Comput Sci. 2024 Oct 25;10:e2421. doi: 10.7717/peerj-cs.2421. eCollection 2024.
2
Proficiency, Clarity, and Objectivity of Large Language Models Versus Specialists' Knowledge on COVID-19's Impacts in Pregnancy: Cross-Sectional Pilot Study.大型语言模型在新冠肺炎对妊娠影响方面的熟练度、清晰度和客观性与专家知识对比:横断面试点研究
JMIR Form Res. 2025 Feb 5;9:e56126. doi: 10.2196/56126.
3
Digital Storytelling for People With Cognitive Impairment Using Available Mobile Apps: Systematic Search in App Stores and Content Analysis.利用现有移动应用程序为认知障碍者进行数字故事讲述:应用商店中的系统搜索和内容分析。
JMIR Aging. 2024 Oct 24;7:e64525. doi: 10.2196/64525.
4
The Use of Generative AI for Scientific Literature Searches for Systematic Reviews: ChatGPT and Microsoft Bing AI Performance Evaluation.生成式人工智能用于系统评价的科学文献检索:ChatGPT和微软必应人工智能性能评估
JMIR Med Inform. 2024 May 14;12:e51187. doi: 10.2196/51187.
5
Evaluating AI-based breastfeeding chatbots: quality, readability, and reliability analysis.评估基于人工智能的母乳喂养聊天机器人:质量、可读性和可靠性分析。
PLoS One. 2025 Mar 17;20(3):e0319782. doi: 10.1371/journal.pone.0319782. eCollection 2025.
6
Artificial intelligence in healthcare education: evaluating the accuracy of ChatGPT, Copilot, and Google Gemini in cardiovascular pharmacology.医疗保健教育中的人工智能:评估ChatGPT、Copilot和谷歌Gemini在心血管药理学方面的准确性。
Front Med (Lausanne). 2025 Feb 19;12:1495378. doi: 10.3389/fmed.2025.1495378. eCollection 2025.
7
Insights from user reviews to improve mental health apps.从用户评论中获取改善心理健康应用的洞见。
Health Informatics J. 2020 Sep;26(3):2042-2066. doi: 10.1177/1460458219896492. Epub 2020 Jan 10.
8
Gemini AI vs. ChatGPT: A comprehensive examination alongside ophthalmology residents in medical knowledge.Gemini人工智能与ChatGPT对比:与眼科住院医师一起对医学知识进行的全面考察
Graefes Arch Clin Exp Ophthalmol. 2025 Feb;263(2):527-536. doi: 10.1007/s00417-024-06625-4. Epub 2024 Sep 15.
9
Assessing the Quality of Patient Education Materials on Cardiac Catheterization From Artificial Intelligence Chatbots: An Observational Cross-Sectional Study.评估人工智能聊天机器人提供的心脏导管插入术患者教育材料的质量:一项观察性横断面研究。
Cureus. 2024 Sep 23;16(9):e69996. doi: 10.7759/cureus.69996. eCollection 2024 Sep.
10
Mobile PHRs compliance with Android and iOS usability guidelines.移动个人健康记录符合安卓和iOS的可用性指南。
J Med Syst. 2014 Aug;38(8):81. doi: 10.1007/s10916-014-0081-6. Epub 2014 Jun 24.

本文引用的文献

1
Leveraging Generative AI Tools to Support the Development of Digital Solutions in Health Care Research: Case Study.利用生成式人工智能工具支持医疗保健研究中数字解决方案的开发:案例研究。
JMIR Hum Factors. 2024 Mar 6;11:e52885. doi: 10.2196/52885.
2
Comparison of the problem-solving performance of ChatGPT-3.5, ChatGPT-4, Bing Chat, and Bard for the Korean emergency medicine board examination question bank.将 ChatGPT-3.5、ChatGPT-4、Bing Chat 和 Bard 用于韩国急诊医学 board 考试题库的问题解决性能比较。
Medicine (Baltimore). 2024 Mar 1;103(9):e37325. doi: 10.1097/MD.0000000000037325.
3
Prototyping a Chatbot for Site Managers Using Building Information Modeling (BIM) and Natural Language Understanding (NLU) Techniques.
使用建筑信息模型 (BIM) 和自然语言理解 (NLU) 技术为现场经理制作聊天机器人原型。
Sensors (Basel). 2023 Mar 8;23(6):2942. doi: 10.3390/s23062942.
4
Usability and Clinician Acceptance of a Deep Learning-Based Clinical Decision Support Tool for Predicting Glaucomatous Visual Field Progression.基于深度学习的青光眼视野进展预测临床决策支持工具的可用性和临床医生接受度。
J Glaucoma. 2023 Mar 1;32(3):151-158. doi: 10.1097/IJG.0000000000002163. Epub 2022 Dec 21.
5
Understanding users' perspectives on mobile apps for anxiety management.了解用户对用于焦虑管理的移动应用程序的看法。
Front Digit Health. 2022 Sep 1;4:854263. doi: 10.3389/fdgth.2022.854263. eCollection 2022.
6
Intelligent User Interfaces and Their Evaluation: A Systematic Mapping Study.智能用户界面及其评估:系统映射研究。
Sensors (Basel). 2022 Aug 4;22(15):5830. doi: 10.3390/s22155830.
7
Mobile health and privacy: cross sectional study.移动健康与隐私:横断面研究
BMJ. 2021 Jun 16;373:n1248. doi: 10.1136/bmj.n1248.
8
Insights into mobile health application market via a content analysis of marketplace data with machine learning.通过对市场数据进行机器学习的内容分析洞察移动健康应用市场。
PLoS One. 2021 Jan 6;16(1):e0244302. doi: 10.1371/journal.pone.0244302. eCollection 2021.
9
Artificial Intelligence-Based Conversational Agents for Chronic Conditions: Systematic Literature Review.基于人工智能的慢性病对话代理:系统文献综述。
J Med Internet Res. 2020 Sep 14;22(9):e20701. doi: 10.2196/20701.