Anees Muhammad, Shaikh Fareed Ahmed, Shaikh Hafsah, Siddiqui Nadeem Ahmed, Rehman Zia Ur
Section of Vascular Surgery, Department of Surgery, Aga Khan University Hospital, Karachi, Pakistan.
Section of Vascular Surgery, Department of Surgery, Aga Khan University Hospital, Karachi, Pakistan.
J Vasc Surg Venous Lymphat Disord. 2025 Jan;13(1):101985. doi: 10.1016/j.jvsv.2024.101985. Epub 2024 Sep 25.
This study aimed to evaluate the accuracy and reproducibility of information provided by ChatGPT, in response to frequently asked questions about radiofrequency ablation (RFA) for varicose veins.
This cross-sectional study was conducted at The Aga Khan University Hospital, Karachi, Pakistan. A set of 18 frequently asked questions regarding RFA for varicose veins were compiled from credible online sources and presented to ChatGPT twice, separately, using the new chat option. Twelve experienced vascular surgeons (with >2 years of experience and ≥20 RFA procedures performed annually) independently evaluated the accuracy of the responses using a 4-point Likert scale and assessed their reproducibility.
Most evaluators were males (n = 10/12 [83.3%]) with an average of 12.3 ± 6.2 years of experience as a vascular surgeon. Six evaluators (50%) were from the UK followed by three from Saudi Arabia (25.0%), two from Pakistan (16.7%), and one from the United States (8.3%). Among the 216 accuracy grades, most of the evaluators graded the responses as comprehensive (n = 87/216 [40.3%]) or accurate but insufficient (n = 70/216 [32.4%]), whereas only 17.1% (n = 37/216) were graded as a mixture of both accurate and inaccurate information and 10.8% (n = 22/216) as entirely inaccurate. Overall, 89.8% of the responses (n = 194/216) were deemed reproducible. Of the total responses, 70.4% (n = 152/216) were classified as good quality and reproducible. The remaining responses were poor quality with 19.4% reproducible (n = 42/216) and 10.2% nonreproducible (n = 22/216). There was nonsignificant inter-rater disagreement among the vascular surgeons for overall responses (Fleiss' kappa, -0.028; P = .131).
ChatGPT provided generally accurate and reproducible information on RFA for varicose veins; however, variability in response quality and limited inter-rater reliability highlight the need for further improvements. Although it has the potential to enhance patient education and support healthcare decision-making, improvements in its training, validation, transparency, and mechanisms to address inaccurate or incomplete information are essential.
本研究旨在评估ChatGPT针对静脉曲张射频消融(RFA)常见问题所提供信息的准确性和可重复性。
本横断面研究在巴基斯坦卡拉奇的阿迦汗大学医院进行。从可靠的在线资源中整理出一组关于静脉曲张RFA的18个常见问题,并使用新的聊天选项分两次分别呈现给ChatGPT。12名经验丰富的血管外科医生(有超过2年经验且每年进行≥20例RFA手术)使用4点李克特量表独立评估回答的准确性,并评估其可重复性。
大多数评估者为男性(n = 10/12 [83.3%]),作为血管外科医生的平均经验为12.3 ± 6.2年。6名评估者(50%)来自英国,其次是3名来自沙特阿拉伯(25.0%),2名来自巴基斯坦(16.7%),1名来自美国(8.3%)。在216个准确性等级中,大多数评估者将回答评为全面(n = 87/216 [40.3%])或准确但不充分(n = 70/216 [32.4%]),而只有17.1%(n = 37/216)被评为准确和不准确信息的混合,10.8%(n = 22/216)被评为完全不准确。总体而言,89.8%的回答(n = 194/216)被认为是可重复的。在所有回答中,70.4%(n = 152/216)被归类为高质量且可重复。其余回答质量较差,19.4%可重复(n = 42/216),10.2%不可重复(n = 22/216)。血管外科医生对总体回答的评分者间分歧不显著(Fleiss卡方值,-0.028;P = 0.131)。
ChatGPT提供了关于静脉曲张RFA的总体准确且可重复的信息;然而,回答质量的变异性和评分者间可靠性有限凸显了进一步改进的必要性。尽管它有潜力加强患者教育并支持医疗决策,但在其训练、验证、透明度以及处理不准确或不完整信息的机制方面进行改进至关重要。