Department of Pediatrics, Women's and Children's Hospital, Chongqing Medical University, 120 Longshan Rd., Chongqing, 401147, People's Republic of China.
Department of Pediatrics, Children's Hospital, Chongqing Medical University, Chongqing, People's Republic of China.
Hernia. 2023 Dec;27(6):1587-1599. doi: 10.1007/s10029-023-02900-1. Epub 2023 Oct 16.
This study utilized ChatGPT, an artificial intelligence program based on large language models, to explore controversial issues in pediatric inguinal hernia surgery and compare its responses with the guidelines of the European Association of Pediatric Surgeons (EUPSA).
Six contentious issues raised by EUPSA were submitted to ChatGPT 4.0 for analysis, for which two independent responses were generated for each issue. These generated answers were subsequently compared with systematic reviews and guidelines. To ensure content accuracy and reliability, a content analysis was conducted, and expert evaluations were solicited for validation. Content analysis evaluated the consistency or discrepancy between ChatGPT 4.0's responses and the guidelines. An expert scoring method assess the quality, reliability, and applicability of responses. The TF-IDF model tested the stability and consistency of the two responses.
The responses generated by ChatGPT 4.0 were mostly consistent with the guidelines. However, some differences and contradictions were noted. The average quality score was 3.33, reliability score was 2.75, and applicability score was 3.46 (out of 5). The average similarity between the two responses was 0.72 (out of 1), Content analysis and expert ratings yielded consistent conclusions, enhancing the credibility of our research.
ChatGPT can provide valuable responses to clinical questions, but it has limitations and requires further improvement. It is recommended to combine ChatGPT with other reliable data sources to improve clinical practice and decision-making.
本研究利用基于大型语言模型的人工智能程序 ChatGPT,探讨小儿腹股沟疝手术中的争议问题,并将其回答与欧洲小儿外科学会(EUPSA)的指南进行比较。
将 EUPSA 提出的 6 个争议问题提交给 ChatGPT 4.0 进行分析,每个问题生成了两个独立的回答。这些生成的回答随后与系统评价和指南进行了比较。为了确保内容的准确性和可靠性,进行了内容分析,并征求了专家的评估进行验证。内容分析评估了 ChatGPT 4.0 的回答与指南之间的一致性或差异。专家评分方法评估了回答的质量、可靠性和适用性。TF-IDF 模型测试了两个回答的稳定性和一致性。
ChatGPT 4.0 生成的回答与指南基本一致,但也存在一些差异和矛盾。平均质量得分为 3.33,可靠性得分为 2.75,适用性得分为 3.46(满分 5 分)。两个回答之间的平均相似度为 0.72(满分 1 分)。内容分析和专家评分得出了一致的结论,提高了研究的可信度。
ChatGPT 可以为临床问题提供有价值的回答,但它存在局限性,需要进一步改进。建议将 ChatGPT 与其他可靠的数据源结合使用,以提高临床实践和决策的质量。