Babb Larry, Bult Carol, Carey Vincent J, Carroll Robert J, Hitz Benjamin C, Mungall Chris J, Rehm Heidi L, Schatz Michael C, Wagner Alex
Broad Institute of MIT and Harvard, Cambridge, MA.
The Jackson Laboratory, Bar Harbor, ME.
ArXiv. 2025 Aug 19:arXiv:2508.13498v1.
In 2024, individuals funded by NHGRI to support genomic community resources completed a Self-Assessment Tool (SAT) to evaluate their application of the FAIR (Findable, Accessible, Interoperable, and Reusable) principles and assess their sustainability. By collecting insights from the self-administered questionnaires and conducting personal interviews, a valuable perspective was gained on the FAIRness and sustainability of the NHGRI resources. The results highlighted several challenges and key areas the NHGRI resource community could improve by working together to form recommendations to address these challenges. The next step was the formation of an Organizing Committee to identify which challenges could lead to best practices or guidelines for the community. The workshop's Organizing Committee comprised four members from the NHGRI resource community: Carol Bult, PhD, Chris Mungall, PhD, Heidi Rehm, PhD, and Michael Schatz, PhD. In December 2024, the Organizing Committee engaged with the NHGRI resource community to refine these challenges further, inviting feedback on potential focus areas for a future workshop. This collaborative approach led to two informative webinars in December 2024, highlighting specific challenges in data curation, data processing, metadata tools, and variant identifiers within the NHGRI resources. Throughout the workshop planning process, the four Organizing Committee members worked together to create and develop themes, design breakout sessions, and create a detailed agenda. The workshop's agenda was intentionally structured to ensure participants could generate implementable recommendations for the NHGRI resource community. The two-day workshop was held in Bethesda, MD, on March 3-4, 2025. The challenges received from NHGRI resources were classified into four key categories, forming the basis of the workshop. The four key categories are variant identifiers, data processing, data curation, and metadata tools. They are briefly described below, with greater details on their challenges and recommendations in subsequent sections. Metadata Tools:While metadata is vital for capturing context in genomic datasets, its usage and relevance can vary by domain, making it difficult to standardize usage. While various methods exist for annotating and extracting metadata, incomplete or inconsistent annotations often result in ineffective data sharing and interoperability, further reducing data usability and reproducibility.Data Curation:Curation of annotations for genomics data is critical for FAIR-ness. Scalable curation solutions are challenging because of the multiple components for curation, including harmonizing data sets, data cleaning, and annotation. The workshop focused on identifying which aspects of data curation could be streamlined using computational methods while considering the barriers to increased automation.Variant Identifiers:Variant identifiers are standardized representations of genetic variants, crucial for sharing and interpreting genomic data in research and clinical work. They ensure consistent referencing and enable data aggregation. Standardizing variant identifiers is difficult due to varied formats, complex data, and distinct environments for generating and disseminating data.Data Processing:Data processing is a necessary first step in a FAIR environment. As there are many variant workflows, streamlining this process will ensure greater accuracy, reproducibility, interoperability, and FAIRness, driving advancements in clinical research. The workshop focused on addressing these aspects with a key focus on improvements and best practices around data processing for an NHGRI resource. Several recommendations were made throughout the workshop's interactive sessions with the resources' participants. While many recommendations were specific to data processing, data curation, metadata tools, or variant identifiers, they can be grouped into core recommendations addressing common challenges within the NHGRI resource community. These core recommendations highlight the key themes that emerged across sessions and are listed in the nine recommendations below. Increase transparency to enable effective sharing/reproducibility (documenting, benchmarking, publishing, mapping)Develop entity schema and ontology mapping tools (between models, identifiers, etc.)Annotate tools using resources to increase findability and reuse (Examples: EDAM Ontology of Bioscientific data analysis and data management)Use standard nomenclature and identifiersMake workflows usable by researchers with limited programming expertiseImplement APIs to improve data connectivityPresent data in an interpretable manner, along with machine readabilityDevelop artificial intelligence/machine learning (AI/ML) methods for scaling curation processesAssess the impact of resources using an independent group that can assess return on investment and impact to health and scientific advancement. An additional key collaborative outcome was the development of Appendix A, which outlines ongoing and future efforts, including additional workshops, webinars, and meetings through the listed events provided by the NHGRI resource community. We hope that these activities will enable further advances in the implementation of FAIR standards and continue to foster collaboration and exchange across NHGRI resources and the global community.
J Health Organ Manag. 2025-6-30
Front Public Health. 2025-6-20
JBI Database System Rev Implement Rep. 2016-4
Cochrane Database Syst Rev. 2014-4-29
Health Technol Assess. 2001
Wellcome Open Res. 2024-12-5