Hisamatsu Takashi, Fukuda Mari, Kinuta Minako, Kanda Hideyuki
Department of Public Health, Okayama University Graduate School of Medicine, Dentistry and Pharmaceutical Sciences.
J Atheroscler Thromb. 2025 May 1;32(5):567-579. doi: 10.5551/jat.65240. Epub 2024 Oct 30.
Artificial intelligence is increasingly used in the medical field. We assessed the accuracy and reproducibility of responses by ChatGPT to clinical questions (CQs) in the Japan Atherosclerosis Society Guidelines for Prevention Atherosclerotic Cardiovascular Diseases 2022 (JAS Guidelines 2022).
In June 2024, we assessed responses by ChatGPT (version 3.5) to CQs, including background questions (BQs) and foreground questions (FQs). Accuracy was assessed independently by three researchers using six-point Likert scales ranging from 1 ("completely incorrect") to 6 ("completely correct") by evaluating responses to CQs in Japanese or translated into English. For reproducibility assessment, responses to each CQ asked five times separately in a new chat were scored using six-point Likert scales, and Fleiss kappa coefficients were calculated.
The median (25th-75th percentile) score for ChatGPT's responses to BQs and FQs was 4 (3-5) and 5 (5-6) for Japanese CQs and 5 (3-6) and 6 (5-6) for English CQs, respectively. Response scores were higher for FQs than those for BQs (P values <0.001 for Japanese and English). Similar response accuracy levels were observed between Japanese and English CQs (P value 0.139 for BQs and 0.586 for FQs). Kappa coefficients for reproducibility were 0.76 for BQs and 0.90 for FQs.
ChatGPT showed high accuracy and reproducibility in responding to JAS Guidelines 2022 CQs, especially FQs. While ChatGPT primarily reflects existing guidelines, its strength could lie in rapidly organizing and presenting relevant information, thus supporting instant and more efficient guideline interpretation and aiding in medical decision-making.
人工智能在医学领域的应用日益广泛。我们评估了ChatGPT对《日本动脉粥样硬化学会2022年动脉粥样硬化性心血管疾病预防指南》(《JAS指南2022》)中临床问题(CQs)的回答的准确性和可重复性。
2024年6月,我们评估了ChatGPT(版本3.5)对CQs的回答,包括背景问题(BQs)和前景问题(FQs)。由三名研究人员独立评估准确性,使用从1(“完全错误”)到6(“完全正确”)的六点李克特量表,通过评估日语或翻译成英语的CQs的回答来进行。为了进行可重复性评估,在新的聊天中分别对每个CQ问五次的回答使用六点李克特量表进行评分,并计算Fleiss卡帕系数。
ChatGPT对日语CQs的BQs和FQs回答的中位数(第25-75百分位数)分数分别为4(3-5)和5(5-6),对英语CQs的回答分别为5(3-6)和6(5-6)。FQs的回答分数高于BQs(日语和英语的P值均<0.001)。日语和英语CQs之间观察到相似的回答准确性水平(BQs的P值为0.139,FQs的P值为0.586)。BQs的可重复性卡帕系数为0.76,FQs的为0.90。
ChatGPT在回答《JAS指南2022》的CQs方面表现出较高准确性和可重复性,尤其是FQs。虽然ChatGPT主要反映现有指南,但其优势可能在于快速整理和呈现相关信息,从而支持即时且更高效的指南解读,并辅助医疗决策。