• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

生物多样性研究中的数据集搜索:数据存储库中的元数据是否反映了学术信息需求?

Dataset search in biodiversity research: Do metadata in data repositories reflect scholarly information needs?

机构信息

Heinz Nixdorf Chair for Distributed Information Systems, Department of Mathematics and Computer Science, Friedrich Schiller University Jena, Jena, Germany.

Michael-Stifel-Center for Data-Driven and Simulation Science, Jena, Germany.

出版信息

PLoS One. 2021 Mar 24;16(3):e0246099. doi: 10.1371/journal.pone.0246099. eCollection 2021.

DOI:10.1371/journal.pone.0246099
PMID:33760822
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7990268/
Abstract

The increasing amount of publicly available research data provides the opportunity to link and integrate data in order to create and prove novel hypotheses, to repeat experiments or to compare recent data to data collected at a different time or place. However, recent studies have shown that retrieving relevant data for data reuse is a time-consuming task in daily research practice. In this study, we explore what hampers dataset retrieval in biodiversity research, a field that produces a large amount of heterogeneous data. In particular, we focus on scholarly search interests and metadata, the primary source of data in a dataset retrieval system. We show that existing metadata currently poorly reflect information needs and therefore are the biggest obstacle in retrieving relevant data. Our findings indicate that for data seekers in the biodiversity domain environments, materials and chemicals, species, biological and chemical processes, locations, data parameters and data types are important information categories. These interests are well covered in metadata elements of domain-specific standards. However, instead of utilizing these standards, large data repositories tend to use metadata standards with domain-independent metadata fields that cover search interests only to some extent. A second problem are arbitrary keywords utilized in descriptive fields such as title, description or subject. Keywords support scholars in a full text search only if the provided terms syntactically match or their semantic relationship to terms used in a user query is known.

摘要

越来越多的公开研究数据为数据链接和整合提供了机会,以便创建和验证新的假设、重复实验,或比较近期数据与不同时间或地点收集的数据。然而,最近的研究表明,在日常研究实践中,检索相关数据以进行数据再利用是一项耗时的任务。在这项研究中,我们探讨了在生物多样性研究中阻碍数据集检索的因素,该领域产生了大量异构数据。特别是,我们关注学术搜索兴趣和元数据,这是数据集检索系统的主要数据来源。我们发现,现有的元数据目前不能很好地反映信息需求,因此是检索相关数据的最大障碍。我们的研究结果表明,对于生物多样性领域的环境、材料和化学品、物种、生物和化学过程、位置、数据参数和数据类型等信息类别,数据搜索者非常关注。这些兴趣在特定领域标准的元数据元素中得到了很好的涵盖。然而,大型数据库往往没有利用这些标准,而是使用具有领域独立性的元数据字段的元数据标准,这些字段在一定程度上涵盖了搜索兴趣。第二个问题是在标题、描述或主题等描述性字段中使用任意关键字。只有在提供的术语在语法上匹配或其与用户查询中使用的术语的语义关系已知的情况下,关键字才能支持学者进行全文搜索。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1827/7990268/c01e9b9756d0/pone.0246099.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1827/7990268/32ca34a23fe8/pone.0246099.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1827/7990268/90bbf4f31499/pone.0246099.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1827/7990268/52f5a0ddc25d/pone.0246099.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1827/7990268/ad15affd2bc7/pone.0246099.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1827/7990268/482c601ce394/pone.0246099.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1827/7990268/c01e9b9756d0/pone.0246099.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1827/7990268/32ca34a23fe8/pone.0246099.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1827/7990268/90bbf4f31499/pone.0246099.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1827/7990268/52f5a0ddc25d/pone.0246099.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1827/7990268/ad15affd2bc7/pone.0246099.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1827/7990268/482c601ce394/pone.0246099.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1827/7990268/c01e9b9756d0/pone.0246099.g006.jpg

相似文献

1
Dataset search in biodiversity research: Do metadata in data repositories reflect scholarly information needs?生物多样性研究中的数据集搜索:数据存储库中的元数据是否反映了学术信息需求?
PLoS One. 2021 Mar 24;16(3):e0246099. doi: 10.1371/journal.pone.0246099. eCollection 2021.
2
Seek and you may (not) find: A multi-institutional analysis of where research data are shared.寻找,你可能(不会)找到:关于研究数据共享地点的多机构分析。
PLoS One. 2024 Apr 25;19(4):e0302426. doi: 10.1371/journal.pone.0302426. eCollection 2024.
3
The CAIRR Pipeline for Submitting Standards-Compliant B and T Cell Receptor Repertoire Sequencing Studies to the National Center for Biotechnology Information Repositories.CAIRR 管道用于向国家生物技术信息中心存储库提交符合标准的 B 和 T 细胞受体文库测序研究。
Front Immunol. 2018 Aug 16;9:1877. doi: 10.3389/fimmu.2018.01877. eCollection 2018.
4
Constructing a biodiversity terminological inventory.构建生物多样性术语库
PLoS One. 2017 Apr 17;12(4):e0175277. doi: 10.1371/journal.pone.0175277. eCollection 2017.
5
Evaluation of repositories for sharing individual-participant data from clinical studies.用于共享临床研究中个体参与者数据的储存库评估。
Trials. 2019 Mar 15;20(1):169. doi: 10.1186/s13063-019-3253-3.
6
QLMDR: a GraphQL query language for ISO 11179-based metadata repositories.QLMDR:基于 ISO 11179 的元数据存储库的 GraphQL 查询语言。
BMC Med Inform Decis Mak. 2019 Mar 18;19(1):45. doi: 10.1186/s12911-019-0794-z.
7
Standards-based metadata procedures for retrieving data for display or mining utilizing persistent (data-DOI) identifiers.基于标准的元数据程序,用于利用持久性(数据DOI)标识符检索数据以进行显示或挖掘。
J Cheminform. 2015 Aug 8;7:37. doi: 10.1186/s13321-015-0081-7. eCollection 2015.
8
FAIR-EuMon: a FAIR-enabling resource for biodiversity monitoring schemes.FAIR-EuMon:一个助力生物多样性监测计划实现FAIR原则的资源。
Biodivers Data J. 2024 Aug 1;12:e125132. doi: 10.3897/BDJ.12.e125132. eCollection 2024.
9
Challenges with organization, discoverability and access in Canadian open health data repositories.加拿大开放健康数据存储库在组织、可发现性和获取方面面临的挑战。
J Can Health Libr Assoc. 2021 Apr 2;42(1):45-65. doi: 10.29173/jchla29457. eCollection 2021 Apr.
10
Importance of timely metadata curation to the global surveillance of genetic diversity.及时进行元数据策管对全球遗传多样性监测的重要性。
Conserv Biol. 2023 Aug;37(4):e14061. doi: 10.1111/cobi.14061. Epub 2023 Mar 10.

引用本文的文献

1
Evaluating the feasibility of automating dataset retrieval for biodiversity monitoring.评估生物多样性监测中数据集检索自动化的可行性。
PeerJ. 2025 Jan 29;13:e18853. doi: 10.7717/peerj.18853. eCollection 2025.
2
Untargeted Metabolomics for Integrative Taxonomy: Metabolomics, DNA Marker-Based Sequencing, and Phenotype Bioimaging.用于综合分类学的非靶向代谢组学:代谢组学、基于DNA标记的测序和表型生物成像。
Plants (Basel). 2023 Feb 15;12(4):881. doi: 10.3390/plants12040881.
3
BiodivNERE: Gold standard corpora for named entity recognition and relation extraction in the biodiversity domain.

本文引用的文献

1
PANGAEA - Data Publisher for Earth & Environmental Science.PANGEA - 地球与环境科学数据发布者。
Sci Data. 2023 Jun 2;10(1):347. doi: 10.1038/s41597-023-02269-x.
2
The archives are half-empty: an assessment of the availability of microbial community sequencing data.档案馆半空:微生物群落测序数据可用性评估。
Commun Biol. 2020 Aug 28;3(1):474. doi: 10.1038/s42003-020-01204-9.
3
A data citation roadmap for scholarly data repositories.学术数据存储库的数据引用路线图。
生物多样性命名实体识别与关系抽取的黄金标准语料库:BiodivNERE
Biodivers Data J. 2022 Oct 7;10:e89481. doi: 10.3897/BDJ.10.e89481. eCollection 2022.
4
Reference bioimaging to assess the phenotypic trait diversity of bryophytes within the family Scapaniaceae.参考生物影像学评估石松科内苔藓植物的表型性状多样性。
Sci Data. 2022 Oct 4;9(1):598. doi: 10.1038/s41597-022-01691-x.
5
MITI minimum information guidelines for highly multiplexed tissue images.MITI 高通量组织图像最低信息指南。
Nat Methods. 2022 Mar;19(3):262-267. doi: 10.1038/s41592-022-01415-4.
Sci Data. 2019 Apr 10;6(1):28. doi: 10.1038/s41597-019-0031-8.
4
FAIRsharing as a community approach to standards, repositories and policies.FAIRsharing作为一种针对标准、存储库和政策的社区方法。
Nat Biotechnol. 2019 Apr;37(4):358-367. doi: 10.1038/s41587-019-0080-8.
5
Environmental coupling of heritability and selection is rare and of minor evolutionary significance in wild populations.环境对遗传力和选择的耦合在野生种群中很少见,对进化的意义也较小。
Nat Ecol Evol. 2018 Jul;2(7):1093-1103. doi: 10.1038/s41559-018-0577-4. Epub 2018 Jun 18.
6
Navigating the unfolding open data landscape in ecology and evolution.探索生态学和进化领域中不断发展的开放数据格局。
Nat Ecol Evol. 2018 Mar;2(3):420-426. doi: 10.1038/s41559-017-0458-2. Epub 2018 Feb 16.
7
Semantic annotation of consumer health questions.消费者健康问题的语义标注。
BMC Bioinformatics. 2018 Feb 6;19(1):34. doi: 10.1186/s12859-018-2045-1.
8
A publicly available benchmark for biomedical dataset retrieval: the reference standard for the 2016 bioCADDIE dataset retrieval challenge.生物医学数据集检索的公开基准:2016 年生物 CADDIE 数据集检索挑战赛的参考标准。
Database (Oxford). 2017 Jan 1;2017. doi: 10.1093/database/bax061.
9
BioSearch: a semantic search engine for Bio2RDF.BioSearch:一个用于 Bio2RDF 的语义搜索引擎。
Database (Oxford). 2017 Jan 1;2017. doi: 10.1093/database/bax059.
10
Essential Annotation Schema for Ecology (EASE)-A framework supporting the efficient data annotation and faceted navigation in ecology.生态学基本注释模式(EASE)——一个支持生态学中高效数据注释和多面导航的框架。
PLoS One. 2017 Oct 12;12(10):e0186170. doi: 10.1371/journal.pone.0186170. eCollection 2017.