LifeBridge Health, Sinai Hospital of Baltimore, Rubin Institute for Advanced Orthopedics, Baltimore, Maryland.
Department of Orthopaedic Surgery, St. Luke's University Health Network, Bethlehem, Pennsylvania.
J Arthroplasty. 2024 Sep;39(9S1):S306-S311. doi: 10.1016/j.arth.2024.04.020. Epub 2024 Apr 16.
The use of ChatGPT (Generative Pretrained Transformer), which is a natural language artificial intelligence model, has gained unparalleled attention with the accumulation of over 100 million users within months of launching. As such, we aimed to compare the following: 1) orthopaedic surgeons' evaluation of the appropriateness of the answers to the most frequently asked patient questions after total hip arthroplasty; and 2) patients' evaluation of ChatGPT and arthroplasty-trained nurses responses to answer their postoperative questions.
We prospectively created 60 questions to address the most commonly asked patient questions following total hip arthroplasty. We obtained answers from arthroplasty-trained nurses and from the ChatGPT-3.5 version for each of the questions. Surgeons graded each set of responses based on clinical judgment as 1) "appropriate," 2) "inappropriate" if the response contained inappropriate information, or 3) "unreliable" if the responses provided inconsistent content. Each patient was given a randomly selected question from the 60 aforementioned questions, with responses provided by ChatGPT and arthroplasty-trained nurses, using a Research Electronic Data Capture survey hosted at our local hospital.
The 3 fellowship-trained surgeons graded 56 out of 60 (93.3%) responses for the arthroplasty-trained nurses and 57 out of 60 (95.0%) for ChatGPT to be "appropriate." There were 175 out of 252 (69.4%) patients who were more comfortable following the ChatGPT responses and 77 out of 252 (30.6%) who preferred arthroplasty-trained nurses' responses. However, 199 out of 252 patients (79.0%) responded that they were "uncertain" with regard to trusting AI to answer their postoperative questions.
ChatGPT provided appropriate answers from a physician perspective. Patients were also more comfortable with the ChatGPT responses than those from arthroplasty-trained nurses. Inevitably, its successful implementation is dependent on its ability to provide credible information that is consistent with the goals of the physician and patient alike.
ChatGPT(生成式预训练转换器)是一种自然语言人工智能模型,自推出以来,在短短几个月内积累了超过 1 亿用户,因此备受关注。因此,我们旨在比较以下内容:1)骨科医生对全髋关节置换术后最常问患者问题的回答的适当性评估;2)患者对 ChatGPT 和接受过关节置换培训的护士回答其术后问题的回答的评估。
我们前瞻性地创建了 60 个问题,以解决全髋关节置换术后最常问的患者问题。我们从接受过关节置换培训的护士和 ChatGPT-3.5 版本中获得了每个问题的答案。医生根据临床判断对每组回答进行评分,结果为 1)“适当”,2)如果回答包含不适当的信息,则为“不适当”,3)如果回答提供的内容不一致,则为“不可靠”。每位患者都从上述 60 个问题中随机选择一个问题,由 ChatGPT 和接受过关节置换培训的护士回答,使用在我们当地医院托管的 Research Electronic Data Capture 调查。
3 名 fellowship培训的外科医生对 60 个(93.3%)接受过关节置换培训的护士的回答和 57 个(95.0%)ChatGPT 的回答进行了评分,认为是“适当”。在 252 名患者中有 175 名(69.4%)在接受 ChatGPT 回复后感觉更舒适,77 名(30.6%)更喜欢接受过关节置换培训的护士的回复。然而,252 名患者中有 199 名(79.0%)表示,他们对 AI 回答他们的术后问题“不确定”。
从医生的角度来看,ChatGPT 提供了适当的答案。患者也对 ChatGPT 的回复比对接受过关节置换培训的护士的回复更满意。不可避免的是,它的成功实施取决于它提供可信信息的能力,这些信息必须与医生和患者的目标一致。