Suppr超能文献

将大型生物医学数据集链接到带有可移植封装项目的模块化分析中。

Linking big biomedical datasets to modular analysis with Portable Encapsulated Projects.

机构信息

Center for Public Health Genomics, University of Virginia, VA 22908, USA.

Department of Public Health Sciences, University of Virginia, VA 22908, USA.

出版信息

Gigascience. 2021 Dec 6;10(12). doi: 10.1093/gigascience/giab077.

Abstract

BACKGROUND

Organizing and annotating biological sample data is critical in data-intensive bioinformatics. Unfortunately, metadata formats from a data provider are often incompatible with requirements of a processing tool. There is no broadly accepted standard to organize metadata across biological projects and bioinformatics tools, restricting the portability and reusability of both annotated datasets and analysis software.

RESULTS

To address this, we present the Portable Encapsulated Project (PEP) specification, a formal specification for biological sample metadata structure. The PEP specification accommodates typical features of data-intensive bioinformatics projects with many biological samples. In addition to standardization, the PEP specification provides descriptors and modifiers for project-level and sample-level metadata, which improve portability across both computing environments and data processing tools. PEPs include a schema validator framework, allowing formal definition of required metadata attributes for data analysis broadly. We have implemented packages for reading PEPs in both Python and R to provide a language-agnostic interface for organizing project metadata.

CONCLUSIONS

The PEP specification is an important step toward unifying data annotation and processing tools in data-intensive biological research projects. Links to tools and documentation are available at http://pep.databio.org/.

摘要

背景

在数据密集型生物信息学中,组织和注释生物样本数据至关重要。不幸的是,数据提供者的元数据格式通常与处理工具的要求不兼容。没有广泛接受的标准来组织跨生物项目和生物信息学工具的元数据,限制了注释数据集和分析软件的可移植性和可重用性。

结果

为了解决这个问题,我们提出了可移植封装项目(PEP)规范,这是一种生物样本元数据结构的正式规范。PEP 规范适应了具有许多生物样本的典型数据密集型生物信息学项目的特征。除了标准化之外,PEP 规范还为项目级和样本级元数据提供了描述符和修饰符,这提高了在计算环境和数据处理工具之间的可移植性。PEP 包括一个模式验证器框架,允许为数据分析广泛定义所需的元数据属性的正式定义。我们已经在 Python 和 R 中实现了读取 PEP 的包,为组织项目元数据提供了一种与语言无关的接口。

结论

PEP 规范是朝着在数据密集型生物研究项目中统一数据注释和处理工具迈出的重要一步。工具和文档的链接可在 http://pep.databio.org/ 获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2314/8673555/ec830d549bdc/giab077fig1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验