Suppr超能文献

通过专家评审和ChatGPT-4评估与扁桃体切除术相关的YouTube视频:多方法质量分析

Evaluating tonsillectomy-related YouTube videos via a human expert review and the ChatGPT-4: a multi-method quality analysis.

作者信息

Serifler Serkan, Gul Fatih

机构信息

School of Medicine, Department of Otolaryngology, Head and Neck Surgery, Ankara Yıldırım Beyazıt University, Ankara, Turkey.

Department of Otolaryngology, Head and Neck Surgery, Lokman Hekim University, Ankara, Turkey.

出版信息

BMC Med Educ. 2025 Aug 11;25(1):1157. doi: 10.1186/s12909-025-07739-x.

Abstract

BACKGROUND

The quality and reliability of health-related content on YouTube remain a growing concern. This study aimed to evaluate tonsillectomy-related YouTube videos using a multi-method framework that combines human expert review, large language model (ChatGPT-4) analysis, and transcript readability assessment.

METHODS

A total of 76 English-language YouTube videos were assessed. Two otolaryngologists independently rated video quality using the DISCERN instrument and JAMA benchmarks. Corrected transcripts were evaluated by ChatGPT-4 (May 2024 version) for accuracy and completeness. Spearman correlations and regression analyses were used to explore associations between human and AI evaluations. Videos were also categorized as transcript-heavy or visually rich to examine the effect of visual presentation.

RESULTS

Professional videos consistently outperformed patient-generated content in quality metrics. ChatGPT-4 accuracy scores showed a strong correlation with JAMA ratings (ρ = 0.56), while completeness was strongly associated with DISCERN scores (ρ = 0.72). Visually rich videos demonstrated significantly higher AI accuracy than transcript-heavy videos (Cohen's d = 0.600, p = 0.030), suggesting that visual context may enhance transcript-based interpretation. However, the average transcript readability (FKGL = 8.38) exceeded the recommended level for patient education.

CONCLUSION

Tonsillectomy-related YouTube content varies widely in quality. Human-AI alignment supports the use of large language models for preliminary content screening. Visually enriched content may improve AI interpretability, while readability concerns highlight the need for more accessible educational resources. Multimodal evaluation and design should be prioritized in future digital health content.

摘要

背景

YouTube上与健康相关内容的质量和可靠性一直是人们日益关注的问题。本研究旨在使用一种多方法框架评估与扁桃体切除术相关的YouTube视频,该框架结合了人类专家评审、大语言模型(ChatGPT-4)分析和转录本可读性评估。

方法

共评估了76个英文YouTube视频。两名耳鼻喉科医生使用DISCERN工具和《美国医学会杂志》基准独立对视频质量进行评分。ChatGPT-4(2024年5月版本)对校正后的转录本进行准确性和完整性评估。使用Spearman相关性和回归分析来探索人类评估与人工智能评估之间的关联。视频还被分类为文本量大或视觉丰富,以检查视觉呈现的效果。

结果

在质量指标方面,专业视频始终优于患者生成的内容。ChatGPT-4准确性得分与《美国医学会杂志》评分呈强相关性(ρ = 0.56),而完整性与DISCERN得分密切相关(ρ = 0.72)。视觉丰富的视频显示出比文本量大的视频具有显著更高的人工智能准确性(Cohen's d = 0.600,p = 0.030),这表明视觉背景可能会增强基于转录本的解释。然而,转录本的平均可读性(FKGL = 8.38)超过了患者教育的推荐水平。

结论

与扁桃体切除术相关的YouTube内容质量差异很大。人机对齐支持使用大语言模型进行初步内容筛选。视觉丰富的内容可能会提高人工智能的可解释性,而可读性问题凸显了对更易获取的教育资源的需求。未来的数字健康内容应优先考虑多模式评估和设计。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验