评估关于活体肾捐赠的人工智能生成信息的质量和可读性。

Evaluating Quality and Readability of AI-generated Information on Living Kidney Donation.

作者信息

Villani Vincenzo, Nguyen Hong-Hanh T, Shanmugarajah Kumaran

机构信息

Division of Immunology and Organ Transplantation, McGovern Medical School at UTHealth Houston, Houston, TX.

Liver Specialists of Texas, Houston, TX.

出版信息

Transplant Direct. 2024 Dec 10;11(1):e1740. doi: 10.1097/TXD.0000000000001740. eCollection 2025 Jan.

DOI:10.1097/TXD.0000000000001740

PMID:39668891

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11634323/

Abstract

BACKGROUND

The availability of high-quality and easy-to-read informative material is crucial to providing accurate information to prospective kidney donors. The quality of this information has been associated with the likelihood of proceeding with a living donation. Artificial intelligence-based large language models (LLMs) have recently become common instruments for acquiring information online, including medical information. The aim of this study was to assess the quality and readability of artificial intelligence-generated information on kidney donation.

METHODS

A set of 35 common donor questions was developed by the authors and used to interrogate 3 LLMs (ChatGPT, Google Gemini, and MedGPT). Answers were collected and independently evaluated using the CLEAR tool for (1) completeness, (2) lack of false information, (3) evidence-based information, (4) appropriateness, and (5) relevance. Readability was evaluated using the Flesch-Kincaid Reading Ease Score and the Flesch-Kincaid Grade Level.

RESULTS

The interrater intraclass correlation was 0.784 (95% confidence interval, 0.716-0.814). Median CLEAR scores were ChatGPT 22 (interquartile range [IQR], 3.67), Google Gemini 24.33 (IQR, 2.33), and MedGPT 23.33 (IQR, 2.00). ChatGPT, Gemini, and MedGPT had mean Flesch-Kincaid Reading Ease Scores of 37.32 (SD = 10.00), 39.42 (SD = 13.49), and 29.66 (SD = 7.94), respectively. Using the Flesch-Kincaid Grade Level assessment, ChatGPT had an average score of 12.29, Gemini had 10.63, and MedGPT had 13.21 ( < 0.001), indicating that all LLMs had a readability at the college-level education.

CONCLUSIONS

Current LLM provides fairly accurate responses to common prospective living kidney donor questions; however, the generated information is complex and requires an advanced level of education. As LLMs become more relevant in the field of medical information, transplant providers should familiarize themselves with the shortcomings of these technologies.

摘要

背景

提供高质量且易于阅读的信息材料对于向潜在的肾脏捐赠者提供准确信息至关重要。该信息的质量与进行活体捐赠的可能性相关。基于人工智能的大语言模型（LLMs）最近已成为在线获取信息（包括医学信息）的常用工具。本研究的目的是评估人工智能生成的关于肾脏捐赠信息的质量和可读性。

方法

作者制定了一组35个常见的捐赠者问题，并用于询问3个大语言模型（ChatGPT、谷歌Gemini和医脉通GPT）。收集答案并使用CLEAR工具独立评估（1）完整性，（2）无虚假信息，（3）基于证据的信息，（4）适当性，以及（5）相关性。使用弗莱什-金凯德易读性评分和弗莱什-金凯德年级水平评估可读性。

结果

评分者间组内相关性为0.784（95%置信区间，0.716 - 0.814）。CLEAR评分中位数分别为ChatGPT 22（四分位间距[IQR]，3.67）、谷歌Gemini 24.33（IQR，2.33）和医脉通GPT 23.33（IQR，2.00）。ChatGPT、Gemini和医脉通GPT的平均弗莱什-金凯德易读性评分分别为37.32（标准差 = 10.00）、39.42（标准差 = 13.49）和29.66（标准差 = 7.94）。使用弗莱什-金凯德年级水平评估，ChatGPT的平均评分为12.29，Gemini为10.63，医脉通GPT为13.21（<0.001），表明所有大语言模型的可读性都处于大学教育水平。

结论

当前的大语言模型对常见的潜在活体肾脏捐赠者问题提供了相当准确的回答；然而，生成的信息很复杂，需要高等教育水平。随着大语言模型在医学信息领域变得更加重要，移植提供者应熟悉这些技术的缺点。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

评估关于活体肾捐赠的人工智能生成信息的质量和可读性。

Evaluating Quality and Readability of AI-generated Information on Living Kidney Donation.

作者信息

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

评估关于活体肾捐赠的人工智能生成信息的质量和可读性。

Evaluating Quality and Readability of AI-generated Information on Living Kidney Donation.

作者信息

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献