非英语语言的大语言模型辅助手术同意书：内容分析与可读性评估

Large Language Model-Assisted Surgical Consent Forms in Non-English Language: Content Analysis and Readability Evaluation.

作者信息

Oh Namkee, Kim Jongman, Park Sunghae, An Sunghyo, Lee Eunjin, Do Hayeon, Baik Jiyoung, Gwon Suk Min, Rhu Jinsoo, Choi Gyu-Seong, Park Seonmin, Cho Jai Young, Lee Hae Won, Lee Boram, Jeong Eun Sung, Lee Jeong-Moo, Choi YoungRok, Kwon Jieun, Kim Kyeong Deok, Kim Seok-Hwan, Chun Gwang-Sik

机构信息

Department of Surgery, Samsung Medical Center, 81 Ilwonro, Seoul, Republic of Korea, 82 1093650277.

Department of Surgery, Seoul National University Bundang Hospital, Seongnam-si, Republic of Korea.

出版信息

J Med Internet Res. 2025 Jun 19;27:e73222. doi: 10.2196/73222.

DOI:10.2196/73222

PMID:40537063

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12200805/

Abstract

BACKGROUND

Surgical consent forms convey critical information; yet, their complex language can limit patient comprehension. Large language models (LLMs) can simplify complex information and improve readability, but evidence of the impact of LLM-generated modifications on content preservation in non-English consent forms is lacking.

OBJECTIVE

This study evaluates the impact of LLM-assisted editing on the readability and content quality of surgical consent forms in Korean-particularly consent documents for standardized liver resection-across multiple institutions.

METHODS

Standardized liver resection consent forms were collected from 7 South Korean medical institutions, and these forms were simplified using ChatGPT-4o. Thereafter, readability was assessed using KReaD and Natmal indices, while text structure was evaluated based on character count, word count, sentence count, words per sentence, and difficult word ratio. Content quality was analyzed across 4 domains-risk, benefit, alternative, and overall impression-using evaluations from 7 liver resection specialists. Statistical comparisons were conducted using paired 2-sided t tests, and a linear mixed-effects model was applied to account for institutional and evaluator variability.

RESULTS

Artificial intelligence-assisted editing significantly improved readability, reducing the KReaD score from 1777 (SD 28.47) to 1335.6 (SD 59.95) (P<.001) and the Natmal score from 1452.3 (SD 88.67) to 1245.3 (SD 96.96) (P=.007). Sentence length and difficult word ratio decreased significantly, contributing to increased accessibility (P<.05). However, content quality analysis showed a decline in the risk description scores (before: 2.29, SD 0.47 vs after: 1.92, SD 0.32; P=.06) and overall impression scores (before: 2.21, SD 0.49 vs after: 1.71, SD 0.64; P=.13). The linear mixed-effects model confirmed significant reductions in risk descriptions (β₁=-0.371; P=.01) and overall impression (β₁=-0.500; P=.03), suggesting potential omissions in critical safety information. Despite this, qualitative analysis indicated that evaluators did not find explicit omissions but perceived the text as overly simplified and less professional.

CONCLUSIONS

Although LLM-assisted surgical consent forms significantly enhance readability, they may compromise certain aspects of content completeness, particularly in risk disclosure. These findings highlight the need for a balanced approach that maintains accessibility while ensuring medical and legal accuracy. Future research should include patient-centered evaluations to assess comprehension and informed decision-making as well as broader multilingual validation to determine LLM applicability across diverse health care settings.

摘要

背景

手术同意书传达关键信息；然而，其复杂的语言可能会限制患者的理解。大语言模型（LLMs）可以简化复杂信息并提高可读性，但缺乏关于大语言模型生成的修改对非英语同意书内容保留影响的证据。

目的

本研究评估大语言模型辅助编辑对韩国手术同意书（特别是标准化肝切除术的同意文件）在多个机构中的可读性和内容质量的影响。

方法

从7家韩国医疗机构收集标准化肝切除术同意书，并使用ChatGPT-4o对这些表格进行简化。此后，使用KReaD和Natmal指数评估可读性，同时根据字符数、单词数、句子数、每个句子的单词数和难词比例评估文本结构。通过7位肝切除专家的评估，对4个领域（风险、益处、替代方案和总体印象）的内容质量进行分析。使用配对双侧t检验进行统计比较，并应用线性混合效应模型来考虑机构和评估者的变异性。

结果

人工智能辅助编辑显著提高了可读性，KReaD分数从1777（标准差28.47）降至1335.6（标准差59.95）（P<.001），Natmal分数从1452.3（标准差88.67）降至1245.3（标准差96.96）（P=.007）。句子长度和难词比例显著降低，有助于提高易读性（P<.05）。然而，内容质量分析显示风险描述分数有所下降（之前：2.29，标准差0.47；之后：1.92，标准差0.32；P=.06）和总体印象分数有所下降（之前：2.21，标准差0.49；之后：1.71，标准差0.64；P=.13）。线性混合效应模型证实风险描述（β₁=-0.371；P=.01）和总体印象（β₁=-0.500；P=.03）显著降低，表明关键安全信息可能存在遗漏。尽管如此，定性分析表明评估者未发现明确的遗漏，但认为文本过于简化且专业性不足。

结论

尽管大语言模型辅助的手术同意书显著提高了可读性，但它们可能会损害内容完整性的某些方面，特别是在风险披露方面。这些发现凸显了需要一种平衡的方法，在确保医疗和法律准确性的同时保持易读性。未来的研究应包括以患者为中心的评估，以评估理解和知情决策，以及更广泛的多语言验证，以确定大语言模型在不同医疗环境中的适用性。