Center for Public Health Genomics, School of Medicine, University of Virginia, Charlottesville, VA 22908, USA.
Department of Biomedical Engineering, School of Medicine, University of Virginia, Charlottesville, VA 22904, USA.
Gigascience. 2024 Jan 2;13. doi: 10.1093/gigascience/giae033.
As biological data increase, we need additional infrastructure to share them and promote interoperability. While major effort has been put into sharing data, relatively less emphasis is placed on sharing metadata. Yet, sharing metadata is also important and in some ways has a wider scope than sharing data themselves.
Here, we present PEPhub, an approach to improve sharing and interoperability of biological metadata. PEPhub provides an API, natural-language search, and user-friendly web-based sharing and editing of sample metadata tables. We used PEPhub to process more than 100,000 published biological research projects and index them with fast semantic natural-language search. PEPhub thus provides a fast and user-friendly way to finding existing biological research data or to share new data.
随着生物数据的增加,我们需要额外的基础设施来共享这些数据并促进互操作性。虽然已经投入了大量精力来共享数据,但相对较少的精力放在共享元数据上。然而,共享元数据同样重要,在某些方面其范围比共享数据本身还要广泛。
在这里,我们提出了一种名为 PEPhub 的方法,以提高生物元数据的共享和互操作性。PEPhub 提供了一个 API、自然语言搜索以及基于网络的样本元数据表格的共享和编辑功能。我们使用 PEPhub 处理了超过 100,000 个已发表的生物研究项目,并使用快速语义自然语言搜索对其进行索引。PEPhub 因此提供了一种快速而用户友好的方式来查找现有的生物研究数据或共享新数据。