Suppr超能文献

矩阵与分析元数据标准(MAMS)以促进单细胞数据的协调统一和可重复性。

Matrix and analysis metadata standards (MAMS) to facilitate harmonization and reproducibility of single-cell data.

作者信息

Wang Yichen, Sarfraz Irzam, Teh Wei Kheng, Sokolov Artem, Herb Brian R, Creasy Heather H, Virshup Isaac, Dries Ruben, Degatano Kylee, Mahurkar Anup, Schnell Daniel J, Madrigal Pedro, Hilton Jason, Gehlenborg Nils, Tickle Timothy, Campbell Joshua D

机构信息

Department of Medicine, Boston University School of Medicine, Boston, MA, USA.

European Bioinformatics Institute, European Molecular Biology Laboratory, Hinxton, Cambridgeshire, UK.

出版信息

bioRxiv. 2023 Mar 7:2023.03.06.531314. doi: 10.1101/2023.03.06.531314.

Abstract

A large number of genomic and imaging datasets are being produced by consortia that seek to characterize healthy and disease tissues at single-cell resolution. While much effort has been devoted to capturing information related to biospecimen information and experimental procedures, the metadata standards that describe data matrices and the analysis workflows that produced them are relatively lacking. Detailed metadata schema related to data analysis are needed to facilitate sharing and interoperability across groups and to promote data provenance for reproducibility. To address this need, we developed the Matrix and Analysis Metadata Standards (MAMS) to serve as a resource for data coordinating centers and tool developers. We first curated several simple and complex "use cases" to characterize the types of feature-observation matrices (FOMs), annotations, and analysis metadata produced in different workflows. Based on these use cases, metadata fields were defined to describe the data contained within each matrix including those related to processing, modality, and subsets. Suggested terms were created for the majority of fields to aid in harmonization of metadata terms across groups. Additional provenance metadata fields were also defined to describe the software and workflows that produced each FOM. Finally, we developed a simple list-like schema that can be used to store MAMS information and implemented in multiple formats. Overall, MAMS can be used as a guide to harmonize analysis-related metadata which will ultimately facilitate integration of datasets across tools and consortia. MAMS specifications, use cases, and examples can be found at https://github.com/single-cell-mams/mams/.

摘要

许多基因组和成像数据集正由致力于以单细胞分辨率表征健康和疾病组织的联盟生成。尽管已经投入了大量精力来获取与生物样本信息和实验程序相关的信息,但描述数据矩阵的元数据标准以及生成这些矩阵的分析工作流程相对缺乏。需要详细的与数据分析相关的元数据模式,以促进跨组共享和互操作性,并促进数据溯源以实现可重复性。为满足这一需求,我们开发了矩阵和分析元数据标准(MAMS),作为数据协调中心和工具开发者的资源。我们首先策划了几个简单和复杂的“用例”,以表征不同工作流程中产生的特征-观测矩阵(FOM)、注释和分析元数据的类型。基于这些用例,定义了元数据字段来描述每个矩阵中包含的数据,包括与处理、模态和子集相关的数据。为大多数字段创建了建议术语,以帮助跨组统一元数据术语。还定义了额外的溯源元数据字段,以描述生成每个FOM的软件和工作流程。最后,我们开发了一个简单的列表式模式,可用于存储MAMS信息并以多种格式实现。总体而言,MAMS可作为统一与分析相关的元数据的指南,这最终将促进跨工具和联盟的数据集整合。MAMS规范、用例和示例可在https://github.com/single-cell-mams/mams/上找到。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3eb0/10028847/e358e384c964/nihpp-2023.03.06.531314v1-f0001.jpg

相似文献

4
linkedISA: semantic representation of ISA-Tab experimental metadata.linkedISA:ISA-Tab 实验元数据的语义表示。
BMC Bioinformatics. 2014;15 Suppl 14(Suppl 14):S4. doi: 10.1186/1471-2105-15-S14-S4. Epub 2014 Nov 27.
10
Toward a Sample Metadata Standard in Public Proteomics Repositories.迈向公共蛋白质组学数据库中的样本元数据标准。
J Proteome Res. 2020 Oct 2;19(10):3906-3909. doi: 10.1021/acs.jproteome.0c00376. Epub 2020 Sep 22.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验