• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于网状化学的单跳和多跳问答数据集与GPT-4-Turbo

Single and Multi-Hop Question-Answering Datasets for Reticular Chemistry with GPT-4-Turbo.

作者信息

Rampal Nakul, Wang Kaiyu, Burigana Matthew, Hou Lingxiang, Al-Johani Juri, Sackmann Anna, Murayshid Hanan S, AlSumari Walaa A, AlAbdulkarim Arwa M, Alhazmi Nahla E, Alawad Majed O, Borgs Christian, Chayes Jennifer T, Yaghi Omar M

机构信息

Department of Chemistry, University of California, Berkeley, California 94720, United States.

Kavli Energy Nanoscience Institute, University of California, Berkeley, California 94720, United States.

出版信息

J Chem Theory Comput. 2024 Oct 22;20(20):9128-9137. doi: 10.1021/acs.jctc.4c00805. Epub 2024 Oct 8.

DOI:10.1021/acs.jctc.4c00805
PMID:39377539
Abstract

The rapid advancement in artificial intelligence and natural language processing has led to the development of large-scale datasets aimed at benchmarking the performance of machine learning models. Herein, we introduce "RetChemQA", a comprehensive benchmark dataset designed to evaluate the capabilities of such models in the domain of reticular chemistry. This dataset includes both single-hop and multi-hop question-answer pairs, encompassing approximately 45,000 question and answers (Q&As) for each type. The questions have been extracted from an extensive corpus of literature containing about 2,530 research papers from publishers including NAS, ACS, RSC, Elsevier, and Nature Publishing Group, among others. The dataset has been generated using OpenAI's GPT-4 Turbo, a cutting-edge model known for its exceptional language understanding and generation capabilities. In addition to the Q&A dataset, we also release a dataset of synthesis conditions extracted from the corpus of literature used in this study. The aim of RetChemQA is to provide a robust platform for the development and evaluation of advanced machine learning algorithms, particularly for the reticular chemistry community. The dataset is structured to reflect the complexities and nuances of real-world scientific discourse, thereby enabling nuanced performance assessments across a variety of tasks.

摘要

人工智能和自然语言处理的快速发展催生了旨在对机器学习模型性能进行基准测试的大规模数据集。在此,我们介绍“RetChemQA”,这是一个全面的基准数据集,旨在评估此类模型在网状化学领域的能力。该数据集包括单跳和多跳问答对,每种类型大约有45,000个问答(Q&A)。这些问题是从大量文献语料库中提取的,该语料库包含来自美国国家科学院(NAS)、美国化学学会(ACS)、皇家化学学会(RSC)、爱思唯尔(Elsevier)和自然出版集团等出版商的约2,530篇研究论文。该数据集是使用OpenAI的GPT - 4 Turbo生成的,这是一个以其卓越的语言理解和生成能力而闻名的前沿模型。除了问答数据集,我们还发布了一个从本研究使用的文献语料库中提取的合成条件数据集。RetChemQA的目的是为先进机器学习算法的开发和评估提供一个强大的平台,特别是为网状化学领域。该数据集的结构反映了现实世界科学论述的复杂性和细微差别,从而能够对各种任务进行细致入微的性能评估。

相似文献

1
Single and Multi-Hop Question-Answering Datasets for Reticular Chemistry with GPT-4-Turbo.用于网状化学的单跳和多跳问答数据集与GPT-4-Turbo
J Chem Theory Comput. 2024 Oct 22;20(20):9128-9137. doi: 10.1021/acs.jctc.4c00805. Epub 2024 Oct 8.
2
Evaluating the Capabilities of Generative AI Tools in Understanding Medical Papers: Qualitative Study.评估生成式人工智能工具理解医学论文的能力:定性研究
JMIR Med Inform. 2024 Sep 4;12:e59258. doi: 10.2196/59258.
3
Stratified Evaluation of GPT's Question Answering in Surgery Reveals Artificial Intelligence (AI) Knowledge Gaps.对GPT在外科手术中问答的分层评估揭示了人工智能(AI)的知识差距。
Cureus. 2023 Nov 14;15(11):e48788. doi: 10.7759/cureus.48788. eCollection 2023 Nov.
4
The performance of ChatGPT on orthopaedic in-service training exams: A comparative study of the GPT-3.5 turbo and GPT-4 models in orthopaedic education.ChatGPT在骨科在职培训考试中的表现:GPT-3.5 turbo和GPT-4模型在骨科教育中的比较研究。
J Orthop. 2023 Nov 23;50:70-75. doi: 10.1016/j.jor.2023.11.056. eCollection 2024 Apr.
5
Evaluating Large Language Models for the National Premedical Exam in India: Comparative Analysis of GPT-3.5, GPT-4, and Bard.评估印度全国医预考用大型语言模型:GPT-3.5、GPT-4 和 Bard 的比较分析。
JMIR Med Educ. 2024 Feb 21;10:e51523. doi: 10.2196/51523.
6
HQA-Data: A historical question answer generation dataset from previous multi perspective conversation.HQA-数据:一个来自以往多视角对话的历史问答生成数据集。
Data Brief. 2023 May 18;48:109245. doi: 10.1016/j.dib.2023.109245. eCollection 2023 Jun.
7
GPT-4 Turbo with Vision fails to outperform text-only GPT-4 Turbo in the Japan Diagnostic Radiology Board Examination.GPT-4 Turbo with Vision 在日本诊断放射学委员会考试中未能优于仅文本的 GPT-4 Turbo。
Jpn J Radiol. 2024 Aug;42(8):918-926. doi: 10.1007/s11604-024-01561-z. Epub 2024 May 11.
8
Evaluating the OpenAI's GPT-3.5 Turbo's performance in extracting information from scientific articles on diabetic retinopathy.评估 OpenAI 的 GPT-3.5 Turbo 在从关于糖尿病视网膜病变的科学文章中提取信息的性能。
Syst Rev. 2024 May 16;13(1):135. doi: 10.1186/s13643-024-02523-2.
9
Large Language Models for Therapy Recommendations Across 3 Clinical Specialties: Comparative Study.大型语言模型在 3 个临床专业领域的治疗推荐中的应用:比较研究。
J Med Internet Res. 2023 Oct 30;25:e49324. doi: 10.2196/49324.
10
Assessing GPT-4's Performance in Delivering Medical Advice: Comparative Analysis With Human Experts.评估 GPT-4 提供医疗建议的表现:与人类专家的比较分析。
JMIR Med Educ. 2024 Jul 8;10:e51282. doi: 10.2196/51282.

引用本文的文献

1
Annotated textual dataset PV600 of perovskite bandgaps for information extraction from literature.用于从文献中提取信息的钙钛矿带隙注释文本数据集PV600。
Sci Data. 2025 Aug 11;12(1):1401. doi: 10.1038/s41597-025-05637-x.