人工智能大语言模型在腹股沟疝修补方法评估的文献检索辅助中的作用

The Role of Artificial Intelligence Large Language Models in Literature Search Assistance to Evaluate Inguinal Hernia Repair Approaches.

作者信息

Kasakewitch Joao P G, Lima Diego L, Balthazar da Silveira Carlos A, Sanha Valberto, Rasador Ana Caroline, Cavazzola Leandro Totti, Mayol Julio, Malcher Flavio

机构信息

Department of Surgery, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, Massachusetts, USA.

Department of Surgery, Montefiore Medical Center, The Bronx, New York, USA.

出版信息

J Laparoendosc Adv Surg Tech A. 2025 Jun;35(6):437-444. doi: 10.1089/lap.2024.0277. Epub 2025 Apr 26.

DOI:10.1089/lap.2024.0277

PMID:40285461

Abstract

This study assesses the reliability of artificial intelligence (AI) large language models (LLMs) in identifying relevant literature comparing inguinal hernia repair techniques. We used LLM chatbots (Bing Chat AI, ChatGPT versions 3.5 and 4.0, and Gemini) to find comparative studies and randomized controlled trials on inguinal hernia repair techniques. The results were then compared with existing systematic reviews (SRs) and meta-analyses and checked for the authenticity of listed articles. LLMs screened 22 studies from 2006 to 2023 across eight journals, while the SRs encompassed a total of 42 studies. Through thorough external validation, 63.6% of the studies (14 out of 22), including 10 identified through Chat GPT 4.0 and 6 via Bing AI (with an overlap of 2 studies between them), were confirmed to be authentic. Conversely, 36.3% (8 out of 22) were revealed as fabrications by Google Gemini (Bard), with two (25.0%) of these fabrications mistakenly linked to valid DOIs. Four (25.6%) of the 14 real studies were acknowledged in the SRs, which represents 18.1% of all LLM-generated studies. LLMs missed a total of 38 (90.5%) of the studies included in the previous SRs, while 10 real studies were found by the LLMs but were not included in the previous SRs. Between those 10 studies, 6 were reviews, and 1 was published after the SRs, leaving a total of three comparative studies missed by the reviews. This study reveals the mixed reliability of AI language models in scientific searches. Emphasizing a cautious application of AI in academia and the importance of continuous evaluation of AI tools in scientific investigations.

摘要

本研究评估了人工智能（AI）大语言模型（LLMs）在识别比较腹股沟疝修补技术相关文献方面的可靠性。我们使用大语言模型聊天机器人（必应聊天AI、ChatGPT 3.5和4.0版本以及Gemini）来查找关于腹股沟疝修补技术的比较研究和随机对照试验。然后将结果与现有的系统评价（SRs）和荟萃分析进行比较，并检查所列文章的真实性。大语言模型在8种期刊中筛选出了2006年至2023年的22项研究，而系统评价总共涵盖了42项研究。通过全面的外部验证，63.6%的研究（22项中的14项）被确认为真实的，其中包括通过Chat GPT 4.0识别出的10项和通过必应AI识别出的6项（两者之间有2项重叠）。相反，谷歌Gemini（巴德）揭示36.3%（22项中的8项）是伪造的，其中两项（25.0%）伪造的研究错误地链接到了有效的数字对象标识符（DOIs）。14项真实研究中有4项（25.6%）在系统评价中被提及，这占所有大语言模型生成研究的18.1%。大语言模型总共遗漏了先前系统评价中包含的38项研究（90.5%），而大语言模型发现了10项真实研究但未被纳入先前的系统评价。在这10项研究中，6项是综述，1项是在系统评价之后发表的，因此系统评价总共遗漏了3项比较研究。本研究揭示了人工智能语言模型在科学搜索中的可靠性参差不齐。强调在学术界谨慎应用人工智能以及在科学研究中持续评估人工智能工具的重要性。

相似文献

The Role of Artificial Intelligence Large Language Models in Literature Search Assistance to Evaluate Inguinal Hernia Repair Approaches.

J Laparoendosc Adv Surg Tech A. 2025 Jun;35(6):437-444. doi: 10.1089/lap.2024.0277. Epub 2025 Apr 26.

Mesh versus non-mesh for inguinal and femoral hernia repair.

Cochrane Database Syst Rev. 2018 Sep 13;9(9):CD011517. doi: 10.1002/14651858.CD011517.pub2.

Laparoscopic techniques versus open techniques for inguinal hernia repair.

Cochrane Database Syst Rev. 2003;2003(1):CD001785. doi: 10.1002/14651858.CD001785.

Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.

Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.

Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.

Cochrane Database Syst Rev. 2021 Apr 19;4(4):CD011535. doi: 10.1002/14651858.CD011535.pub4.

Laparoscopic surgery for inguinal hernia repair: systematic review of effectiveness and economic evaluation.

Health Technol Assess. 2005 Apr;9(14):1-203, iii-iv. doi: 10.3310/hta9140.

Large Language Model-Assisted Risk-of-Bias Assessment in Randomized Controlled Trials Using the Revised Risk-of-Bias Tool: Usability Study.

J Med Internet Res. 2025 Jun 24;27:e70450. doi: 10.2196/70450.

Comparison of cellulose, modified cellulose and synthetic membranes in the haemodialysis of patients with end-stage renal disease.

Cochrane Database Syst Rev. 2001(3):CD003234. doi: 10.1002/14651858.CD003234.

Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.

Cochrane Database Syst Rev. 2017 Dec 22;12(12):CD011535. doi: 10.1002/14651858.CD011535.pub2.

Sertindole for schizophrenia.

Cochrane Database Syst Rev. 2005 Jul 20;2005(3):CD001715. doi: 10.1002/14651858.CD001715.pub2.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

人工智能大语言模型在腹股沟疝修补方法评估的文献检索辅助中的作用

The Role of Artificial Intelligence Large Language Models in Literature Search Assistance to Evaluate Inguinal Hernia Repair Approaches.

作者信息

Kasakewitch Joao P G, Lima Diego L, Balthazar da Silveira Carlos A, Sanha Valberto, Rasador Ana Caroline, Cavazzola Leandro Totti, Mayol Julio, Malcher Flavio

机构信息

Department of Surgery, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, Massachusetts, USA.

Department of Surgery, Montefiore Medical Center, The Bronx, New York, USA.

出版信息

J Laparoendosc Adv Surg Tech A. 2025 Jun;35(6):437-444. doi: 10.1089/lap.2024.0277. Epub 2025 Apr 26.

DOI:10.1089/lap.2024.0277

PMID:40285461

Abstract

摘要

人工智能大语言模型在腹股沟疝修补方法评估的文献检索辅助中的作用

The Role of Artificial Intelligence Large Language Models in Literature Search Assistance to Evaluate Inguinal Hernia Repair Approaches.

作者信息

机构信息

出版信息

相似文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

人工智能大语言模型在腹股沟疝修补方法评估的文献检索辅助中的作用

The Role of Artificial Intelligence Large Language Models in Literature Search Assistance to Evaluate Inguinal Hernia Repair Approaches.

作者信息

机构信息

出版信息

相似文献