ZB MED - Information Centre for Life Sciences, Cologne, Germany.
Graduate School Digital Infrastructure in the Life Sciences, Bielefeld Institute for Bioinformatics Infrastructure, Faculty of Technology, Bielefeld University, Bielefeld, Germany.
J Med Internet Res. 2022 Apr 8;24(4):e34072. doi: 10.2196/34072.
The current COVID-19 crisis underscores the importance of preprints, as they allow for rapid communication of research results without delay in review. To fully integrate this type of publication into library information systems, we developed preview: a publicly available, central search engine for COVID-19-related preprints, which clearly distinguishes this source from peer-reviewed publications. The relationship between the preprint version and its corresponding journal version should be stored as metadata in both versions so that duplicates can be easily identified and information overload for researchers is reduced.
In this work, we investigated the extent to which the relationship information between preprint and corresponding journal publication is present in the published metadata, how it can be further completed, and how it can be used in preVIEW to identify already republished preprints and filter those duplicates in search results.
We first analyzed the information content available at the preprint servers themselves and the information that can be retrieved via Crossref. Moreover, we developed the algorithm Pre2Pub to find the corresponding reviewed article for each preprint. We integrated the results of those different resources into our search engine preVIEW, presented the information in the result set overview, and added filter options accordingly.
Preprints have found their place in publication workflows; however, the link from a preprint to its corresponding journal publication is not completely covered in the metadata of the preprint servers or in Crossref. Our algorithm Pre2Pub is able to find approximately 16% more related journal articles with a precision of 99.27%. We also integrate this information in a transparent way within preVIEW so that researchers can use it in their search.
Relationships between the preprint version and its journal version is valuable information that can help researchers finding only previously unknown information in preprints. As long as there is no transparent and complete way to store this relationship in metadata, the Pre2Pub algorithm is a suitable extension to retrieve this information.
当前的 COVID-19 危机突显了预印本的重要性,因为它们可以在不延迟审查的情况下快速交流研究结果。为了将这种类型的出版物完全纳入图书馆信息系统,我们开发了 preview:一个可公开访问的 COVID-19 相关预印本中央搜索引擎,该搜索引擎清楚地区分了这种来源与经过同行评审的出版物。预印本版本与其相应的期刊版本之间的关系应作为元数据存储在两个版本中,以便轻松识别重复项并减少研究人员的信息过载。
在这项工作中,我们研究了预印本和相应期刊出版物之间的关系信息在已发表元数据中的存在程度、如何进一步完成该信息以及如何在 preview 中使用它来识别已重新发布的预印本并在搜索结果中过滤那些重复项。
我们首先分析了预印本服务器本身提供的信息内容以及通过 Crossref 可以检索到的信息。此外,我们开发了算法 Pre2Pub 来为每个预印本找到相应的评审文章。我们将这些不同资源的结果集成到我们的搜索引擎 preview 中,在结果集概述中呈现信息,并相应地添加筛选选项。
预印本已经在出版工作流程中找到了自己的位置;然而,预印本服务器或 Crossref 中的元数据并未完全涵盖预印本与其相应期刊出版物之间的链接。我们的算法 Pre2Pub 能够以 99.27%的精度找到大约 16%更多的相关期刊文章。我们还以透明的方式将此信息集成到 preview 中,以便研究人员在搜索时可以使用它。
预印本版本与其期刊版本之间的关系是有价值的信息,可以帮助研究人员在预印本中仅找到以前未知的信息。只要没有透明和完整的方式在元数据中存储这种关系,Pre2Pub 算法就是检索此信息的合适扩展。