• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

大语言模型可以提取元数据用于人类神经影像出版物的注释。

Large Language Models Can Extract Metadata for Annotation of Human Neuroimaging Publications.

作者信息

Turner Matthew D, Appaji Abhishek, Rakib Nibras Ar, Golnari Pedram, Rajasekar Arcot K, Rathnam K V Anitha, Sahoo Satya S, Wang Yue, Wang Lei, Turner Jessica A

机构信息

Department of Psychiatry, The Ohio State University, Columbus, Ohio, USA.

Department of Medical Electronics Engineering, B.M.S. College of Engineering, Bengaluru, India.

出版信息

bioRxiv. 2025 May 14:2025.05.13.653828. doi: 10.1101/2025.05.13.653828.

DOI:10.1101/2025.05.13.653828
PMID:40462943
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12132202/
Abstract

We show that recent (mid-to-late 2024) commercial large language models (LLMs) are capable of good quality metadata extraction and annotation with very little work on the part of investigators for several exemplar real-world annotation tasks in the neuroimaging literature. We investigated the GPT-4o LLM from OpenAI which performed comparably with several groups of specially trained and supervised human annotators. The LLM achieves similar performance to humans, between 0.91 and 0.97 on zero-shot prompts without feedback to the LLM. Reviewing the disagreements between LLM and gold standard human annotations we note that actual LLM errors are comparable to human errors in most cases, and in many cases these disagreements are not errors. Based on the specific types of annotations we tested, with exceptionally reviewed gold-standard correct values, the LLM performance is usable for metadata annotation at scale. We encourage other research groups to develop and make available more specialized "micro-benchmarks," like the ones we provide here, for testing both LLMs, and more complex agent systems annotation performance in real-world metadata annotation tasks.

摘要

我们表明,近期(2024年年中至年末)的商业大语言模型(LLM)能够高质量地提取和注释元数据,而研究人员只需付出极少的努力,就能完成神经影像学文献中几个典型的现实世界注释任务。我们研究了OpenAI的GPT-4o LLM,其表现与几组经过专门训练和监督的人类注释者相当。在无反馈的零样本提示下,该LLM的性能与人类相似,在0.91至0.97之间。通过审视LLM与黄金标准人类注释之间的差异,我们注意到在大多数情况下,LLM的实际错误与人类错误相当,而且在许多情况下,这些差异并非错误。基于我们测试的特定注释类型以及经过特别审核的黄金标准正确值,LLM的性能可用于大规模的元数据注释。我们鼓励其他研究团队开发并提供更多专门的“微基准测试”,就像我们在此提供的那样,用于测试LLM以及更复杂的智能体系统在现实世界元数据注释任务中的注释性能。

相似文献

1
Large Language Models Can Extract Metadata for Annotation of Human Neuroimaging Publications.大语言模型可以提取元数据用于人类神经影像出版物的注释。
bioRxiv. 2025 May 14:2025.05.13.653828. doi: 10.1101/2025.05.13.653828.
2
Large language models can extract metadata for annotation of human neuroimaging publications.大型语言模型可以提取元数据,用于注释人类神经影像学术出版物。
Front Neuroinform. 2025 Aug 20;19:1609077. doi: 10.3389/fninf.2025.1609077. eCollection 2025.
3
Data extraction from free-text stroke CT reports using GPT-4o and Llama-3.3-70B: the impact of annotation guidelines.使用GPT-4o和Llama-3.3-70B从自由文本中风CT报告中提取数据:注释指南的影响
Eur Radiol Exp. 2025 Jun 19;9(1):61. doi: 10.1186/s41747-025-00600-2.
4
Large Language Models and Empathy: Systematic Review.大语言模型与同理心:系统综述
J Med Internet Res. 2024 Dec 11;26:e52597. doi: 10.2196/52597.
5
Using a Diverse Test Suite to Assess Large Language Models on Fast Health Care Interoperability Resources Knowledge: Comparative Analysis.使用多样化测试套件在快速医疗保健互操作性资源知识方面评估大语言模型:比较分析
J Med Internet Res. 2025 Aug 12;27:e73540. doi: 10.2196/73540.
6
Performance and Reproducibility of Large Language Models in Named Entity Recognition: Considerations for the Use in Controlled Environments.大型语言模型在命名实体识别中的性能与可重复性:在受控环境中使用的考量
Drug Saf. 2025 Mar;48(3):287-303. doi: 10.1007/s40264-024-01499-1. Epub 2024 Dec 11.
7
Implementing Large Language Models in Health Care: Clinician-Focused Review With Interactive Guideline.在医疗保健中应用大语言模型:以临床医生为重点的回顾与交互式指南
J Med Internet Res. 2025 Jul 11;27:e71916. doi: 10.2196/71916.
8
Large Language Models for Psychiatric Phenotype Extraction from Electronic Health Records.用于从电子健康记录中提取精神疾病表型的大语言模型
medRxiv. 2025 Aug 12:2025.08.07.25333172. doi: 10.1101/2025.08.07.25333172.
9
Large Language Model Symptom Identification From Clinical Text: Multicenter Study.基于临床文本的大语言模型症状识别:多中心研究。
J Med Internet Res. 2025 Jul 31;27:e72984. doi: 10.2196/72984.
10
Use of Large Language Models to Classify Epidemiological Characteristics in Synthetic and Real-World Social Media Posts About Conjunctivitis Outbreaks: Infodemiology Study.利用大语言模型对合成及真实世界社交媒体上有关结膜炎爆发的帖子中的流行病学特征进行分类:信息流行病学研究
J Med Internet Res. 2025 Jul 2;27:e65226. doi: 10.2196/65226.

本文引用的文献

1
An automated framework for assessing how well LLMs cite relevant medical references.一个用于评估大语言模型引用相关医学参考文献能力的自动化框架。
Nat Commun. 2025 Apr 16;16(1):3615. doi: 10.1038/s41467-025-58551-6.
2
Are chatbots reliable text annotators? Sometimes.聊天机器人是可靠的文本注释工具吗?有时候是。
PNAS Nexus. 2025 Apr 1;4(4):pgaf069. doi: 10.1093/pnasnexus/pgaf069. eCollection 2025 Apr.
3
ENIGMA-Meditation: Worldwide Consortium for Neuroscientific Investigations of Meditation Practices.ENIGMA-冥想:全球冥想实践神经科学调查联盟。
Biol Psychiatry Cogn Neurosci Neuroimaging. 2025 Apr;10(4):425-436. doi: 10.1016/j.bpsc.2024.10.015. Epub 2024 Nov 6.
4
Large language models for biomedicine: foundations, opportunities, challenges, and best practices.大型语言模型在生物医学领域的应用:基础、机遇、挑战和最佳实践。
J Am Med Inform Assoc. 2024 Sep 1;31(9):2114-2124. doi: 10.1093/jamia/ocae074.
5
Large-Scale Neuroimaging of Mental Illness.大规模神经影像学在精神疾病中的应用。
Curr Top Behav Neurosci. 2024;68:371-397. doi: 10.1007/7854_2024_462.
6
14 examples of how LLMs can transform materials science and chemistry: a reflection on a large language model hackathon.大语言模型如何改变材料科学与化学的14个实例:对一场大语言模型黑客马拉松的思考
Digit Discov. 2023 Aug 8;2(5):1233-1250. doi: 10.1039/d3dd00113j. eCollection 2023 Oct 9.
7
NeuroBridge: a prototype platform for discovery of the long-tail neuroimaging data.NeuroBridge:一个用于发现长尾神经影像数据的原型平台。
Front Neuroinform. 2023 Aug 31;17:1215261. doi: 10.3389/fninf.2023.1215261. eCollection 2023.
8
Revisiting Relation Extraction in the era of Large Language Models.重访大语言模型时代的关系抽取
Proc Conf Assoc Comput Linguist Meet. 2023 Jul;2023:15566-15589. doi: 10.18653/v1/2023.acl-long.868.
9
NeuroBridge ontology: computable provenance metadata to give the long tail of neuroimaging data a FAIR chance for secondary use.神经桥本体:可计算的溯源元数据,为神经影像数据的长尾提供二次使用的公平机会。
Front Neuroinform. 2023 Jul 24;17:1216443. doi: 10.3389/fninf.2023.1216443. eCollection 2023.
10
Large language models in medicine.医学中的大型语言模型。
Nat Med. 2023 Aug;29(8):1930-1940. doi: 10.1038/s41591-023-02448-8. Epub 2023 Jul 17.