• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

OpenAI的新o1模型在常见眼科护理问题上能否超越其前身?

Can OpenAI's New o1 Model Outperform Its Predecessors in Common Eye Care Queries?

作者信息

Pushpanathan Krithi, Zou Minjie, Srinivasan Sahana, Wong Wendy Meihua, Mangunkusumo Erlangga Ariadarma, Thomas George Naveen, Lai Yien, Sun Chen-Hsin, Lam Janice Sing Harn, Tan Marcus Chun Jin, Lin Hazel Anne Hui'En, Ma Weizhi, Koh Victor Teck Chang, Chen David Ziyou, Tham Yih-Chung

机构信息

Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore.

Centre for Innovation and Precision Eye Health, Yong Loo Lin School of Medicine, National University of Singapore and National University Health System, Singapore.

出版信息

Ophthalmol Sci. 2025 Feb 22;5(4):100745. doi: 10.1016/j.xops.2025.100745. eCollection 2025 Jul-Aug.

DOI:10.1016/j.xops.2025.100745
PMID:40291392
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12022690/
Abstract

OBJECTIVE

The newly launched OpenAI o1 is said to offer improved reasoning, potentially providing higher quality responses to eye care queries. However, its performance remains unassessed. We evaluated the performance of o1, ChatGPT-4o, and ChatGPT-4 in addressing ophthalmic-related queries, focusing on correctness, completeness, and readability.

DESIGN

Cross-sectional study.

SUBJECTS

Sixteen queries, previously identified as suboptimally responded to by ChatGPT-4 from prior studies, were used, covering 3 subtopics: myopia (6 questions), ocular symptoms (4 questions), and retinal conditions (6 questions).

METHODS

For each subtopic, 3 attending-level ophthalmologists, masked to the model sources, evaluated the responses based on correctness, completeness, and readability (on a 5-point scale for each metric).

MAIN OUTCOME MEASURES

Mean summed scores of each model for correctness, completeness, and readability, rated on a 5-point scale (maximum score: 15).

RESULTS

O1 scored highest in correctness (12.6) and readability (14.2), outperforming ChatGPT-4, which scored 10.3 ( = 0.010) and 12.4 ( < 0.001), respectively. No significant difference was found between o1 and ChatGPT-4o. When stratified by subtopics, o1 consistently demonstrated superior correctness and readability. In completeness, ChatGPT-4o achieved the highest score of 12.4, followed by o1 (10.8), though the difference was not statistically significant. o1 showed notable limitations in completeness for ocular symptom queries, scoring 5.5 out of 15.

CONCLUSIONS

While o1 is marketed as offering improved reasoning capabilities, its performance in addressing eye care queries does not significantly differ from its predecessor, ChatGPT-4o. Nevertheless, it surpasses ChatGPT-4, particularly in correctness and readability.

FINANCIAL DISCLOSURES

Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.

摘要

目的

新推出的OpenAI o1据说具有改进的推理能力,可能会为眼科护理问题提供更高质量的回答。然而,其性能仍未得到评估。我们评估了o1、ChatGPT-4o和ChatGPT-4在处理眼科相关问题方面的性能,重点关注正确性、完整性和可读性。

设计

横断面研究。

研究对象

使用了先前研究中确定ChatGPT-4回答欠佳的16个问题,涵盖3个子主题:近视(6个问题)、眼部症状(4个问题)和视网膜疾病(6个问题)。

方法

对于每个子主题,3名主治医师级别的眼科医生在不知道模型来源的情况下,根据正确性、完整性和可读性(每个指标采用5分制)对回答进行评估。

主要观察指标

每个模型在正确性、完整性和可读性方面的平均总分,采用5分制评分(最高分:15分)。

结果

o1在正确性(12.6)和可读性(14.2)方面得分最高,优于ChatGPT-4,后者在正确性和可读性方面的得分分别为10.3(P = 0.010)和12.4(P < 0.001)。o1和ChatGPT-4o之间未发现显著差异。按子主题分层时,o1始终表现出更高的正确性和可读性。在完整性方面,ChatGPT-4o得分最高,为12.4,其次是o1(10.8),但差异无统计学意义。o1在眼部症状问题的完整性方面存在明显局限性,得分为15分中的5.5分。

结论

虽然o1被宣传为具有改进的推理能力,但其在处理眼科护理问题方面的性能与前身ChatGPT-4o相比没有显著差异。然而,它超过了ChatGPT-4,特别是在正确性和可读性方面。

财务披露

专有或商业披露信息可在本文末尾的脚注和披露中找到。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5ff3/12022690/6faa4cebad9d/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5ff3/12022690/ae791eaf6d7a/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5ff3/12022690/6faa4cebad9d/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5ff3/12022690/ae791eaf6d7a/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5ff3/12022690/6faa4cebad9d/gr2.jpg

相似文献

1
Can OpenAI's New o1 Model Outperform Its Predecessors in Common Eye Care Queries?OpenAI的新o1模型在常见眼科护理问题上能否超越其前身?
Ophthalmol Sci. 2025 Feb 22;5(4):100745. doi: 10.1016/j.xops.2025.100745. eCollection 2025 Jul-Aug.
2
Evaluating the Effectiveness of Large Language Models in Providing Patient Education for Chinese Patients With Ocular Myasthenia Gravis: Mixed Methods Study.评估大语言模型为中国重症肌无力性眼病患者提供患者教育的有效性:混合方法研究
J Med Internet Res. 2025 Apr 10;27:e67883. doi: 10.2196/67883.
3
Performance of AI-Chatbots to Common Temporomandibular Joint Disorders (TMDs) Patient Queries: Accuracy, Completeness, Reliability and Readability.人工智能聊天机器人对常见颞下颌关节紊乱病(TMDs)患者问题的回答:准确性、完整性、可靠性和可读性。
Orthod Craniofac Res. 2025 May 7. doi: 10.1111/ocr.12939.
4
Large Language Models: Pioneering New Educational Frontiers in Childhood Myopia.大语言模型:开创儿童近视教育新前沿
Ophthalmol Ther. 2025 Jun;14(6):1281-1295. doi: 10.1007/s40123-025-01142-x. Epub 2025 Apr 21.
5
Evaluating ChatGPT and Google Gemini Performance and Implications in Turkish Dental Education.评估ChatGPT和谷歌Gemini在土耳其牙科教育中的性能及影响
Cureus. 2025 Jan 11;17(1):e77292. doi: 10.7759/cureus.77292. eCollection 2025 Jan.
6
An Evaluation of the Performance of OpenAI-o1 and GPT-4o in the Japanese National Examination for Physical Therapists.OpenAI-o1和GPT-4o在日本物理治疗师国家考试中的表现评估
Cureus. 2025 Jan 6;17(1):e76989. doi: 10.7759/cureus.76989. eCollection 2025 Jan.
7
Appropriateness and Readability of ChatGPT-4-Generated Responses for Surgical Treatment of Retinal Diseases.ChatGPT-4 生成的回复在视网膜疾病手术治疗中的适宜性和可读性。
Ophthalmol Retina. 2023 Oct;7(10):862-868. doi: 10.1016/j.oret.2023.05.022. Epub 2023 Jun 3.
8
Assessing the Quality and Reliability of ChatGPT's Responses to Radiotherapy-Related Patient Queries: Comparative Study With GPT-3.5 and GPT-4.评估ChatGPT对放疗相关患者问题回答的质量和可靠性:与GPT-3.5和GPT-4的比较研究
JMIR Cancer. 2025 Apr 16;11:e63677. doi: 10.2196/63677.
9
Comparative Analysis of ChatGPT-4o and Gemini Advanced Performance on Diagnostic Radiology In-Training Exams.ChatGPT-4o与Gemini在放射诊断学培训考试中的性能对比分析
Cureus. 2025 Mar 20;17(3):e80874. doi: 10.7759/cureus.80874. eCollection 2025 Mar.
10
Evaluation of the quality and readability of ChatGPT responses to frequently asked questions about myopia in traditional Chinese language.评估ChatGPT对中文常见近视相关问题的回答质量和可读性。
Digit Health. 2024 Sep 2;10:20552076241277021. doi: 10.1177/20552076241277021. eCollection 2024 Jan-Dec.

引用本文的文献

1
Performance of ChatGPT-4 Omni and Gemini 1.5 Pro on Ophthalmology-Related Questions in the Turkish Medical Specialty Exam.ChatGPT-4 Omni和Gemini 1.5 Pro在土耳其医学专业考试中与眼科相关问题上的表现。
Turk J Ophthalmol. 2025 Aug 21;55(4):177-185. doi: 10.4274/tjo.galenos.2025.27895.
2
Ophthalmological Question Answering and Reasoning Using OpenAI o1 vs Other Large Language Models.使用OpenAI的o1与其他大语言模型进行眼科问答和推理
JAMA Ophthalmol. 2025 Jul 31. doi: 10.1001/jamaophthalmol.2025.2413.

本文引用的文献

1
Comparing generative and retrieval-based chatbots in answering patient questions regarding age-related macular degeneration and diabetic retinopathy.比较基于生成和检索的聊天机器人在回答与年龄相关性黄斑变性和糖尿病视网膜病变相关的患者问题方面的表现。
Br J Ophthalmol. 2024 Sep 20;108(10):1443-1449. doi: 10.1136/bjo-2023-324533.
2
Assessing the Efficacy of Large Language Models in Health Literacy: A Comprehensive Cross-Sectional Study.评估大语言模型在健康素养中的功效:一项全面的横断面研究。
Yale J Biol Med. 2024 Mar 29;97(1):17-27. doi: 10.59249/ZTOZ1966. eCollection 2024 Mar.
3
Systematic analysis of ChatGPT, Google search and Llama 2 for clinical decision support tasks.
系统分析 ChatGPT、Google 搜索和 Llama 2 在临床决策支持任务中的应用。
Nat Commun. 2024 Mar 6;15(1):2050. doi: 10.1038/s41467-024-46411-8.
4
Assessment of a Large Language Model's Responses to Questions and Cases About Glaucoma and Retina Management.评估大型语言模型对青光眼和视网膜管理相关问题和病例的回答。
JAMA Ophthalmol. 2024 Apr 1;142(4):371-375. doi: 10.1001/jamaophthalmol.2023.6917.
5
Large language models and their impact in ophthalmology.大语言模型及其在眼科学中的影响。
Lancet Digit Health. 2023 Dec;5(12):e917-e924. doi: 10.1016/S2589-7500(23)00201-7.
6
Popular large language model chatbots' accuracy, comprehensiveness, and self-awareness in answering ocular symptom queries.流行的大语言模型聊天机器人在回答眼部症状查询时的准确性、全面性和自我意识。
iScience. 2023 Oct 10;26(11):108163. doi: 10.1016/j.isci.2023.108163. eCollection 2023 Nov 17.
7
The future landscape of large language models in medicine.医学领域大语言模型的未来前景。
Commun Med (Lond). 2023 Oct 10;3(1):141. doi: 10.1038/s43856-023-00370-1.
8
Benchmarking large language models' performances for myopia care: a comparative analysis of ChatGPT-3.5, ChatGPT-4.0, and Google Bard.比较分析 ChatGPT-3.5、ChatGPT-4.0 和谷歌巴德在近视防控方面的表现:大型语言模型的基准测试。
EBioMedicine. 2023 Sep;95:104770. doi: 10.1016/j.ebiom.2023.104770. Epub 2023 Aug 23.
9
Use of Artificial Intelligence Chatbots for Cancer Treatment Information.使用人工智能聊天机器人获取癌症治疗信息。
JAMA Oncol. 2023 Oct 1;9(10):1459-1462. doi: 10.1001/jamaoncol.2023.2954.
10
Comparison of Ophthalmologist and Large Language Model Chatbot Responses to Online Patient Eye Care Questions.眼科医生与大型语言模型聊天机器人对在线患者眼部护理问题的回复比较。
JAMA Netw Open. 2023 Aug 1;6(8):e2330320. doi: 10.1001/jamanetworkopen.2023.30320.