基因组轨迹元数据的 FAIR 化推荐。

Recommendations for the FAIRification of genomic track metadata.

机构信息

Center for Bioinformatics, University of Oslo (UiO), Oslo, Norway.

European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK.

出版信息

F1000Res. 2021 Apr 1;10. doi: 10.12688/f1000research.28449.1. eCollection 2021.

DOI:10.12688/f1000research.28449.1

PMID:34249331

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8226415/

Abstract

Many types of data from genomic analyses can be represented as genomic tracks, features linked to the genomic coordinates of a reference genome. Examples of such data are epigenetic DNA methylation data, ChIP-seq peaks, germline or somatic DNA variants, as well as RNA-seq expression levels. Researchers often face difficulties in locating, accessing and combining relevant tracks from external sources, as well as locating the raw data, reducing the value of the generated information. We propose to advance the application of FAIR data principles (Findable, Accessible, Interoperable, and Reusable) to produce searchable metadata for genomic tracks. Findability and Accessibility of metadata can then be ensured by a track search service that integrates globally identifiable metadata from various track hubs in the Track Hub Registry and other relevant repositories. Interoperability and Reusability need to be ensured by the specification and implementation of a basic set of recommendations for metadata. We have tested this concept by developing such a specification in a JSON Schema, called FAIRtracks, and have integrated it into a novel track search service, called TrackFind. We demonstrate practical usage by importing datasets through TrackFind into existing examples of relevant analytical tools for genomic tracks: EPICO and the GSuite HyperBrowser. We here provide a first iteration of a draft standard for genomic track metadata, as well as the accompanying software ecosystem. It can easily be adapted or extended to future needs of the research community regarding data, methods and tools, balancing the requirements of both data submitters and analytical end-users.

摘要

许多类型的基因组分析数据可以表示为基因组轨迹，这些轨迹与参考基因组的基因组坐标相关联。此类数据的示例包括表观遗传 DNA 甲基化数据、ChIP-seq 峰、种系或体细胞 DNA 变体以及 RNA-seq 表达水平。研究人员经常面临从外部来源定位、访问和组合相关轨迹以及定位原始数据的困难，从而降低了生成信息的价值。我们建议推进 FAIR 数据原则（可发现、可访问、可互操作和可重用）的应用，为基因组轨迹生成可搜索的元数据。然后，可以通过一个轨迹搜索服务来确保元数据的可发现性和可访问性，该服务整合了来自 Track Hub Registry 中的各种轨迹中心以及其他相关存储库中的全球可识别元数据。互操作性和可重用性需要通过规范和实现一组基本的元数据建议来确保。我们通过以 JSON 模式（称为 FAIRtracks）开发这种规范并将其集成到一个名为 TrackFind 的新型轨迹搜索服务中，测试了这个概念。我们通过通过 TrackFind 将数据集导入到现有的基因组轨迹相关分析工具（EPICO 和 GSuite HyperBrowser）中，展示了其实用性。我们在此提供了基因组轨迹元数据的标准草案的第一个迭代，以及伴随的软件生态系统。它可以轻松适应或扩展到研究社区在数据、方法和工具方面的未来需求，平衡数据提交者和分析最终用户的要求。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a766/8226415/506bd12e739b/f1000research-10-31488-g0000.jpg

相似文献

Recommendations for the FAIRification of genomic track metadata.基因组轨迹元数据的 FAIR 化推荐。

F1000Res. 2021 Apr 1;10. doi: 10.12688/f1000research.28449.1. eCollection 2021.

From Raw Data to FAIR Data: The FAIRification Workflow for Health Research.从原始数据到 FAIR 数据：健康研究的 FAIR 化工作流程。

Methods Inf Med. 2020 Jun;59(S 01):e21-e32. doi: 10.1055/s-0040-1713684. Epub 2020 Jul 3.

hGSuite HyperBrowser: A web-based toolkit for hierarchical metadata-informed analysis of genomic tracks.hGSuite HyperBrowser：一个基于网络的工具包，用于基于层次元数据的基因组轨迹信息分析。

PLoS One. 2023 Jul 19;18(7):e0286330. doi: 10.1371/journal.pone.0286330. eCollection 2023.

Making Metadata Machine-Readable as the First Step to Providing Findable, Accessible, Interoperable, and Reusable Population Health Data: Framework Development and Implementation Study.将元数据转化为机器可读形式作为提供可查找、可访问、可互操作和可重用的人群健康数据的第一步：框架开发与实施研究

Online J Public Health Inform. 2024 Aug 1;16:e56237. doi: 10.2196/56237.

GSuite HyperBrowser: integrative analysis of dataset collections across the genome and epigenome.GSuite HyperBrowser：跨基因组和表观基因组数据集集合的综合分析。

Gigascience. 2017 Jul 1;6(7):1-12. doi: 10.1093/gigascience/gix032.

De-novo FAIRification via an Electronic Data Capture system by automated transformation of filled electronic Case Report Forms into machine-readable data.通过电子数据采集系统对填写好的电子病例报告表进行自动化转换，从而实现新的 FAIR 化，将其转化为机器可读的数据。

J Biomed Inform. 2021 Oct;122:103897. doi: 10.1016/j.jbi.2021.103897. Epub 2021 Aug 26.

Developing a standardized but extendable framework to increase the findability of infectious disease datasets.开发一个标准化但可扩展的框架，以提高传染病数据集的可发现性。

Sci Data. 2023 Feb 23;10(1):99. doi: 10.1038/s41597-023-01968-9.

How to Assess FAIRness of Your Data - A Summary of Testing Two FAIR Validators.如何评估数据的 FAIR 性 - 两种 FAIR 验证器测试总结。

Stud Health Technol Inform. 2024 Jan 25;310:154-158. doi: 10.3233/SHTI230946.

Adamant: a JSON schema-based metadata editor for research data management workflows.坚韧不拔：一个基于 JSON 模式的元数据编辑器，用于研究数据管理工作流程。

F1000Res. 2022 Apr 29;11:475. doi: 10.12688/f1000research.110875.2. eCollection 2022.

OSSE Goes FAIR - Implementation of the FAIR Data Principles for an Open-Source Registry for Rare Diseases.罕见病开源注册库的FAIR数据原则实施——OSSE迈向FAIR

Stud Health Technol Inform. 2018;253:209-213.

引用本文的文献

Advances in whole genome sequencing for foodborne pathogens: implications for clinical infectious disease surveillance and public health.食源性病原体全基因组测序的进展：对临床传染病监测和公共卫生的影响

Front Cell Infect Microbiol. 2025 Apr 28;15:1593219. doi: 10.3389/fcimb.2025.1593219. eCollection 2025.

Building a FAIR data ecosystem for incorporating single-cell transcriptomics data into agricultural genome to phenome research.构建一个用于将单细胞转录组学数据纳入农业基因组到表型组研究的公平数据生态系统。

Front Genet. 2024 Nov 29;15:1460351. doi: 10.3389/fgene.2024.1460351. eCollection 2024.

PLoS One. 2023 Jul 19;18(7):e0286330. doi: 10.1371/journal.pone.0286330. eCollection 2023.

Challenges to sharing sample metadata in computational genomics.计算基因组学中样本元数据共享面临的挑战。

Front Genet. 2023 May 23;14:1154198. doi: 10.3389/fgene.2023.1154198. eCollection 2023.

Resources and tools for rare disease variant interpretation.罕见病变异解读的资源与工具。

Front Mol Biosci. 2023 May 10;10:1169109. doi: 10.3389/fmolb.2023.1169109. eCollection 2023.

Schema Playground: a tool for authoring, extending, and using metadata schemas to improve FAIRness of biomedical data.模式游乐场：一个用于创作、扩展和使用元数据模式以提高生物医学数据 FAIR 性的工具。

BMC Bioinformatics. 2023 Apr 20;24(1):159. doi: 10.1186/s12859-023-05258-4.

GrainGenes: a data-rich repository for small grains genetics and genomics.GrainGenes：一个富含小谷物遗传学和基因组学数据的资源库。

Database (Oxford). 2022 May 25;2022. doi: 10.1093/database/baac034.

FAIR Genomes metadata schema promoting Next Generation Sequencing data reuse in Dutch healthcare and research.FAIR 基因组元数据模式促进荷兰医疗保健和研究领域的下一代测序数据再利用。

Sci Data. 2022 Apr 13;9(1):169. doi: 10.1038/s41597-022-01265-x.

本文引用的文献

epiCOLOC: Integrating Large-Scale and Context-Dependent Epigenomics Features for Comprehensive Colocalization Analysis.epiCOLOC：整合大规模和上下文相关的表观基因组学特征以进行全面的共定位分析。

Front Genet. 2020 Feb 12;11:53. doi: 10.3389/fgene.2020.00053. eCollection 2020.

Ensembl 2020.Ensembl 2020.

Nucleic Acids Res. 2020 Jan 8;48(D1):D682-D688. doi: 10.1093/nar/gkz966.

UCSC Genome Browser enters 20th year.UCSC Genome Browser 迎来 20 周年。

Nucleic Acids Res. 2020 Jan 8;48(D1):D756-D761. doi: 10.1093/nar/gkz1012.

Sharing interoperable workflow provenance: A review of best practices and their practical application in CWLProv.共享可互操作的工作流溯源：最佳实践综述及其在 CWLProv 中的实际应用。

Gigascience. 2019 Nov 1;8(11). doi: 10.1093/gigascience/giz095.

BioSamples database: an updated sample metadata hub.BioSamples 数据库：更新的样本元数据中心。

Nucleic Acids Res. 2019 Jan 8;47(D1):D1172-D1178. doi: 10.1093/nar/gky1061.

Colocalization analyses of genomic elements: approaches, recommendations and challenges.基因组元件的共定位分析：方法、建议和挑战。

Bioinformatics. 2019 May 1;35(9):1615-1624. doi: 10.1093/bioinformatics/bty835.

LOLAweb: a containerized web server for interactive genomic locus overlap enrichment analysis.LOLAweb：用于交互式基因组 locus 重叠富集分析的集装箱化网络服务器。

Nucleic Acids Res. 2018 Jul 2;46(W1):W194-W199. doi: 10.1093/nar/gky464.

Uniform resolution of compact identifiers for biomedical data.统一解析生物医学数据的紧凑标识符。

Sci Data. 2018 May 8;5:180029. doi: 10.1038/sdata.2018.29.

Epigenomic annotation-based interpretation of genomic data: from enrichment analysis to machine learning.基于表观基因组注释的基因组数据分析：从富集分析到机器学习。

Bioinformatics. 2017 Oct 15;33(20):3323-3330. doi: 10.1093/bioinformatics/btx414.

Genome build information is an essential part of genomic track files.基因组构建信息是基因组轨迹文件的重要组成部分。

Genome Biol. 2017 Sep 14;18(1):175. doi: 10.1186/s13059-017-1312-1.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基因组轨迹元数据的 FAIR 化推荐。

Recommendations for the FAIRification of genomic track metadata.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献