• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

重新思考机器可读形式的研究结果的产出与发表。

Rethinking the production and publication of machine-readable expressions of research findings.

作者信息

Stocker Markus, Snyder Lauren, Anfuso Matthew, Ludwig Oliver, Thießen Freya, Farfar Kheir Eddine, Haris Muhammad, Oelen Allard, Jaradeh Mohamad Yaser

机构信息

TIB - Leibniz Information Centre for Science and Technology, 30167, Hannover, Germany.

Leibniz University Hannover, Institute of Data Science, 30167, Hannover, Germany.

出版信息

Sci Data. 2025 Apr 30;12(1):677. doi: 10.1038/s41597-025-04905-0.

DOI:10.1038/s41597-025-04905-0
PMID:40307293
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12043899/
Abstract

Scientific literature is the primary expression of scientific knowledge and an important source of research data. However, scientific knowledge expressed in narrative text documents is not inherently machine readable. To facilitate knowledge reuse, knowledge must be extracted from articles and organized into databases post-publication. The high time costs and inaccuracies associated with completing these activities manually has driven the development of techniques that automate knowledge extraction. Tackling the problem with a different mindset, we propose a pre-publication approach, known as reborn, that ensures scientific knowledge is born readable, i.e. produced in a machine-readable format with formal data syntax during knowledge production. We implement the approach using the Open Research Knowledge Graph infrastructure for FAIR scientific knowledge organization. With a focus on statistical research findings, we test the approach with three use cases in soil science, computer science, and agroecology. Our results suggest that the proposed approach is superior compared to classical manual and semi-automated post-publication extraction techniques in terms of knowledge accuracy, richness, and reproducibility as well as technological simplicity.

摘要

科学文献是科学知识的主要表达方式和研究数据的重要来源。然而,叙事文本文件中表达的科学知识本身并非机器可读。为了促进知识重用,必须在文章发表后从文章中提取知识并组织到数据库中。手动完成这些活动所涉及的高昂时间成本和不准确性推动了自动化知识提取技术的发展。我们以不同的思维方式来解决这个问题,提出了一种预发表方法,称为重生,该方法可确保科学知识从诞生起就是可读的,即在知识生产过程中以具有形式化数据语法的机器可读格式生成。我们使用开放研究知识图谱基础设施来实施该方法,以实现公平的科学知识组织。以统计研究结果为重点,我们在土壤科学、计算机科学和农业生态学的三个用例中测试了该方法。我们的结果表明,就知识准确性、丰富性、可重复性以及技术简单性而言,所提出的方法优于传统的手动和半自动发表后提取技术。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2756/12043899/0c72030ef364/41597_2025_4905_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2756/12043899/27860ab72ee6/41597_2025_4905_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2756/12043899/58d298832e99/41597_2025_4905_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2756/12043899/23ec0c516068/41597_2025_4905_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2756/12043899/6cf14bcc57c1/41597_2025_4905_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2756/12043899/793247d8a132/41597_2025_4905_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2756/12043899/0c72030ef364/41597_2025_4905_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2756/12043899/27860ab72ee6/41597_2025_4905_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2756/12043899/58d298832e99/41597_2025_4905_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2756/12043899/23ec0c516068/41597_2025_4905_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2756/12043899/6cf14bcc57c1/41597_2025_4905_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2756/12043899/793247d8a132/41597_2025_4905_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2756/12043899/0c72030ef364/41597_2025_4905_Fig6_HTML.jpg

相似文献

1
Rethinking the production and publication of machine-readable expressions of research findings.重新思考机器可读形式的研究结果的产出与发表。
Sci Data. 2025 Apr 30;12(1):677. doi: 10.1038/s41597-025-04905-0.
2
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
3
Construction of biological networks from unstructured information based on a semi-automated curation workflow.基于半自动编目工作流程从非结构化信息构建生物网络。
Database (Oxford). 2015 Jun 17;2015:bav057. doi: 10.1093/database/bav057.
4
Evaluation of a semi-automated data extraction tool for public health literature-based reviews: Dextr.评估一种用于公共卫生文献综述的半自动数据提取工具:Dextr。
Environ Int. 2022 Jan 15;159:107025. doi: 10.1016/j.envint.2021.107025. Epub 2021 Dec 14.
5
Making Metadata Machine-Readable as the First Step to Providing Findable, Accessible, Interoperable, and Reusable Population Health Data: Framework Development and Implementation Study.将元数据转化为机器可读形式作为提供可查找、可访问、可互操作和可重用的人群健康数据的第一步:框架开发与实施研究
Online J Public Health Inform. 2024 Aug 1;16:e56237. doi: 10.2196/56237.
6
Turning text into research networks: information retrieval and computational ontologies in the creation of scientific databases.将文本转化为研究网络:信息检索和计算本体在科学数据库创建中的应用。
PLoS One. 2012;7(1):e27499. doi: 10.1371/journal.pone.0027499. Epub 2012 Jan 3.
7
The Natural Products Atlas: An Open Access Knowledge Base for Microbial Natural Products Discovery.《天然产物图谱:微生物天然产物发现的开放获取知识库》
ACS Cent Sci. 2019 Nov 27;5(11):1824-1833. doi: 10.1021/acscentsci.9b00806. Epub 2019 Nov 14.
8
Assessing author willingness to enter study information into structured data templates as part of the manuscript submission process: A pilot study.评估作者在稿件提交过程中愿意将研究信息录入结构化数据模板的情况:一项试点研究。
Heliyon. 2022 Mar 11;8(3):e09095. doi: 10.1016/j.heliyon.2022.e09095. eCollection 2022 Mar.
9
[Health technology assessment report: Computer-assisted Pap test for cervical cancer screening].[卫生技术评估报告:用于宫颈癌筛查的计算机辅助巴氏试验]
Epidemiol Prev. 2012 Sep-Oct;36(5 Suppl 3):e1-43.
10
The Cooperation Databank: Machine-Readable Science Accelerates Research Synthesis.合作数据库:机器可读科学加速研究综合
Perspect Psychol Sci. 2022 Sep;17(5):1472-1489. doi: 10.1177/17456916211053319. Epub 2022 May 17.

本文引用的文献

1
Extracting accurate materials data from research papers with conversational language models and prompt engineering.利用对话式语言模型和提示工程从研究论文中提取准确的材料数据。
Nat Commun. 2024 Feb 21;15(1):1569. doi: 10.1038/s41467-024-45914-8.
2
Structured information extraction from scientific text with large language models.利用大语言模型从科学文本中提取结构化信息。
Nat Commun. 2024 Feb 15;15(1):1418. doi: 10.1038/s41467-024-45563-x.
3
GeneGPT: augmenting large language models with domain tools for improved access to biomedical information.
GeneGPT:利用领域工具增强大型语言模型,以改善对生物医学信息的访问。
Bioinformatics. 2024 Feb 1;40(2). doi: 10.1093/bioinformatics/btae075.
4
The Cooperation Databank: Machine-Readable Science Accelerates Research Synthesis.合作数据库:机器可读科学加速研究综合
Perspect Psychol Sci. 2022 Sep;17(5):1472-1489. doi: 10.1177/17456916211053319. Epub 2022 May 17.
5
Semantic micro-contributions with decentralized nanopublication services.具有去中心化纳米出版物服务的语义微贡献。
PeerJ Comput Sci. 2021 Mar 8;7:e387. doi: 10.7717/peerj-cs.387. eCollection 2021.
6
Trends in the Usage of Statistical Software and Their Associated Study Designs in Health Sciences Research: A Bibliometric Analysis.健康科学研究中统计软件的使用趋势及其相关研究设计:一项文献计量分析。
Cureus. 2021 Jan 11;13(1):e12639. doi: 10.7759/cureus.12639.
7
Contrasting effects of landscape composition on crop yield mediated by specialist herbivores.景观组成对专食性草食动物介导的作物产量的对比影响。
Ecol Appl. 2018 Apr;28(3):842-853. doi: 10.1002/eap.1695. Epub 2018 Apr 4.
8
Frequency of data extraction errors and methods to increase data extraction quality: a methodological review.数据提取错误的频率及提高数据提取质量的方法:方法学综述。
BMC Med Res Methodol. 2017 Nov 28;17(1):152. doi: 10.1186/s12874-017-0431-4.
9
On expert curation and scalability: UniProtKB/Swiss-Prot as a case study.关于专业策展和可扩展性:以 UniProtKB/Swiss-Prot 为例。
Bioinformatics. 2017 Nov 1;33(21):3454-3460. doi: 10.1093/bioinformatics/btx439.
10
The FAIR Guiding Principles for scientific data management and stewardship.科学数据管理和保存的 FAIR 指导原则。
Sci Data. 2016 Mar 15;3:160018. doi: 10.1038/sdata.2016.18.