文献检索，用中文搜 PubMed

BACKGROUND

Artificial intelligence (AI) large language models (LLMs) are becoming increasingly popular, with patients and families more likely to utilize LLM when conducting internet-based research about scoliosis. For this reason, it is vital to understand the abilities and limitations of this technology in disseminating accurate medical information. We used an expert panel to compare LLM-generated and professional society-authored answers to frequently asked questions about pediatric scoliosis.

METHODS

We used three publicly available LLMs to generate answers to 15 frequently asked questions (FAQs) regarding pediatric scoliosis. The FAQs were derived from the Scoliosis Research Society, the American Academy of Orthopaedic Surgeons, and the Pediatric Spine Foundation. We gave minimal training to the LLM other than specifying the response length and requesting answers at a 5th-grade reading level. A 15-question survey was distributed to an expert panel composed of pediatric spine surgeons. To determine readability, responses were inputted into an open-source calculator. The panel members were presented with an AI and a physician-generated response to a FAQ and asked to select which they preferred. They were then asked to individually grade the accuracy of responses on a Likert scale.

RESULTS

The panel members had a mean of 8.9 years of experience post-fellowship (range: 3-23 years). The panel reported nearly equivalent agreement between AI-generated and physician-generated answers. The expert panel favored professional society-written responses for 40% of questions, AI for 40%, ranked responses equally good for 13%, and saw a tie between AI and "equally good" for 7%. For two professional society-generated and three AI-generated responses, the error bars of the expert panel mean score for accuracy and appropriateness fell below neutral, indicating a lack of consensus and mixed opinions with the response.

CONCLUSIONS

Based on the expert panel review, AI delivered accurate and appropriate answers as frequently as professional society-authored FAQ answers from professional society websites. AI and professional society websites were equally likely to generate answers with which the expert panel disagreed.

KEY CONCEPTS

(1)Large language models (LLMs) are increasingly used for generating medical information online, necessitating an evaluation of their accuracy and effectiveness compared with traditional sources.(2)An expert panel of physicians compared artificial intelligence (AI)-generated answers with professional society-authored answers to pediatric scoliosis frequently asked questions, finding that both types of answers were equally favored in terms of accuracy and appropriateness.(3)The panel reported a similar rate of disagreement with AI-generated and professional society-generated answers, indicating that both had areas of controversy.(4)Over half of the expert panel members felt they could distinguish between AI-generated and professional society-generated answers but this did not relate to their preferences.(5)While AI can support medical information dissemination, further research and improvements are needed to address its limitations and ensure high-quality, accessible patient education.

LEVELS OF EVIDENCE

IV.

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

KEY CONCEPTS

LEVELS OF EVIDENCE

IV.

背景

人工智能（AI）大语言模型（LLMs）越来越受欢迎，患者和家庭在进行关于脊柱侧弯的网络研究时更有可能使用大语言模型。因此，了解这项技术在传播准确医学信息方面的能力和局限性至关重要。我们使用了一个专家小组来比较大语言模型生成的和专业协会撰写的关于小儿脊柱侧弯常见问题的答案。

方法

我们使用三个公开可用的大语言模型来生成关于小儿脊柱侧弯的15个常见问题（FAQ）的答案。这些常见问题来自脊柱侧弯研究协会、美国骨科医师学会和小儿脊柱基金会。除了指定回答长度并要求以五年级阅读水平给出答案外，我们对大语言模型的训练极少。向一个由小儿脊柱外科医生组成的专家小组发放了一份包含15个问题的调查问卷。为了确定可读性，将回答输入到一个开源计算器中。向专家小组成员展示一个人工智能生成的和一个医生生成的关于常见问题的回答，并要求他们选择更喜欢哪一个。然后要求他们以李克特量表单独对回答的准确性进行评分。

结果

专家小组成员 fellowship 后的平均经验为8.9年（范围：3 - 23年）。该小组报告称，人工智能生成的答案和医生生成的答案之间的一致性几乎相同。专家小组对40%的问题青睐专业协会撰写的回答，40%青睐人工智能生成的回答，13%认为两者同样好，7%认为人工智能生成的回答与“同样好”的回答不分上下。对于两个专业协会生成的回答和三个人工智能生成的回答，专家小组关于准确性和适当性的平均得分的误差线低于中性水平，表明对这些回答缺乏共识且意见不一。

结论

基于专家小组的审查，人工智能提供准确和适当答案的频率与专业协会在其网站上撰写的常见问题解答相同。人工智能生成的答案和专业协会网站生成的答案同样有可能引发专家小组的不同意见。

关键概念

（1）大语言模型（LLMs）越来越多地用于在线生成医学信息，因此有必要将其与传统来源的准确性和有效性进行评估。（2）一个医生专家小组比较了人工智能（AI）生成的答案与专业协会撰写的小儿脊柱侧弯常见问题的答案，发现两种类型的答案在准确性和适当性方面同样受到青睐。（3）该小组报告称，对人工智能生成的答案和专业协会生成的答案的不同意见发生率相似，表明两者都存在有争议的领域。（4）超过一半的专家小组成员认为他们能够区分人工智能生成的答案和专业协会生成的答案，但这与他们的偏好无关。（5）虽然人工智能可以支持医学信息传播，但需要进一步研究和改进以解决其局限性并确保高质量、易于获取的患者教育。

证据级别

IV。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

基于人工智能的大语言模型可促进患者教育。

Artificial Intelligence-Based Large Language Models Can Facilitate Patient Education.

作者信息

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

KEY CONCEPTS

LEVELS OF EVIDENCE

相似文献

本文引用的文献

基于人工智能的大语言模型可促进患者教育。

Artificial Intelligence-Based Large Language Models Can Facilitate Patient Education.

作者信息

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

KEY CONCEPTS

LEVELS OF EVIDENCE

背景

方法

结果

结论

关键概念

证据级别

相似文献

本文引用的文献