Stocker Markus, Snyder Lauren, Anfuso Matthew, Ludwig Oliver, Thießen Freya, Farfar Kheir Eddine, Haris Muhammad, Oelen Allard, Jaradeh Mohamad Yaser
TIB - Leibniz Information Centre for Science and Technology, 30167, Hannover, Germany.
Leibniz University Hannover, Institute of Data Science, 30167, Hannover, Germany.
Sci Data. 2025 Apr 30;12(1):677. doi: 10.1038/s41597-025-04905-0.
Scientific literature is the primary expression of scientific knowledge and an important source of research data. However, scientific knowledge expressed in narrative text documents is not inherently machine readable. To facilitate knowledge reuse, knowledge must be extracted from articles and organized into databases post-publication. The high time costs and inaccuracies associated with completing these activities manually has driven the development of techniques that automate knowledge extraction. Tackling the problem with a different mindset, we propose a pre-publication approach, known as reborn, that ensures scientific knowledge is born readable, i.e. produced in a machine-readable format with formal data syntax during knowledge production. We implement the approach using the Open Research Knowledge Graph infrastructure for FAIR scientific knowledge organization. With a focus on statistical research findings, we test the approach with three use cases in soil science, computer science, and agroecology. Our results suggest that the proposed approach is superior compared to classical manual and semi-automated post-publication extraction techniques in terms of knowledge accuracy, richness, and reproducibility as well as technological simplicity.
科学文献是科学知识的主要表达方式和研究数据的重要来源。然而,叙事文本文件中表达的科学知识本身并非机器可读。为了促进知识重用,必须在文章发表后从文章中提取知识并组织到数据库中。手动完成这些活动所涉及的高昂时间成本和不准确性推动了自动化知识提取技术的发展。我们以不同的思维方式来解决这个问题,提出了一种预发表方法,称为重生,该方法可确保科学知识从诞生起就是可读的,即在知识生产过程中以具有形式化数据语法的机器可读格式生成。我们使用开放研究知识图谱基础设施来实施该方法,以实现公平的科学知识组织。以统计研究结果为重点,我们在土壤科学、计算机科学和农业生态学的三个用例中测试了该方法。我们的结果表明,就知识准确性、丰富性、可重复性以及技术简单性而言,所提出的方法优于传统的手动和半自动发表后提取技术。