Suppr超能文献

将元数据转化为机器可读形式作为提供可查找、可访问、可互操作和可重用的人群健康数据的第一步:框架开发与实施研究

Making Metadata Machine-Readable as the First Step to Providing Findable, Accessible, Interoperable, and Reusable Population Health Data: Framework Development and Implementation Study.

作者信息

Amadi David, Kiwuwa-Muyingo Sylvia, Bhattacharjee Tathagata, Taylor Amelia, Kiragga Agnes, Ochola Michael, Kanjala Chifundo, Gregory Arofan, Tomlin Keith, Todd Jim, Greenfield Jay

机构信息

Department of Population Health, Faculty of Epidemiology and Population Health, London School of Hygiene and Tropical Medicine, London, United Kingdom.

African Population and Health Research Center, Nairobi, Kenya.

出版信息

Online J Public Health Inform. 2024 Aug 1;16:e56237. doi: 10.2196/56237.

Abstract

BACKGROUND

Metadata describe and provide context for other data, playing a pivotal role in enabling findability, accessibility, interoperability, and reusability (FAIR) data principles. By providing comprehensive and machine-readable descriptions of digital resources, metadata empower both machines and human users to seamlessly discover, access, integrate, and reuse data or content across diverse platforms and applications. However, the limited accessibility and machine-interpretability of existing metadata for population health data hinder effective data discovery and reuse.

OBJECTIVE

To address these challenges, we propose a comprehensive framework using standardized formats, vocabularies, and protocols to render population health data machine-readable, significantly enhancing their FAIRness and enabling seamless discovery, access, and integration across diverse platforms and research applications.

METHODS

The framework implements a 3-stage approach. The first stage is Data Documentation Initiative (DDI) integration, which involves leveraging the DDI Codebook metadata and documentation of detailed information for data and associated assets, while ensuring transparency and comprehensiveness. The second stage is Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) standardization. In this stage, the data are harmonized and standardized into the OMOP CDM, facilitating unified analysis across heterogeneous data sets. The third stage involves the integration of Schema.org and JavaScript Object Notation for Linked Data (JSON-LD), in which machine-readable metadata are generated using Schema.org entities and embedded within the data using JSON-LD, boosting discoverability and comprehension for both machines and human users. We demonstrated the implementation of these 3 stages using the Integrated Disease Surveillance and Response (IDSR) data from Malawi and Kenya.

RESULTS

The implementation of our framework significantly enhanced the FAIRness of population health data, resulting in improved discoverability through seamless integration with platforms such as Google Dataset Search. The adoption of standardized formats and protocols streamlined data accessibility and integration across various research environments, fostering collaboration and knowledge sharing. Additionally, the use of machine-interpretable metadata empowered researchers to efficiently reuse data for targeted analyses and insights, thereby maximizing the overall value of population health resources. The JSON-LD codes are accessible via a GitHub repository and the HTML code integrated with JSON-LD is available on the Implementation Network for Sharing Population Information from Research Entities website.

CONCLUSIONS

The adoption of machine-readable metadata standards is essential for ensuring the FAIRness of population health data. By embracing these standards, organizations can enhance diverse resource visibility, accessibility, and utility, leading to a broader impact, particularly in low- and middle-income countries. Machine-readable metadata can accelerate research, improve health care decision-making, and ultimately promote better health outcomes for populations worldwide.

摘要

背景

元数据描述并为其他数据提供上下文,在实现可查找、可访问、可互操作和可重用(FAIR)数据原则方面发挥着关键作用。通过提供对数字资源的全面且机器可读的描述,元数据使机器和人类用户能够在不同平台和应用程序之间无缝发现、访问、集成和重用数据或内容。然而,现有人口健康数据元数据的可访问性和机器可解释性有限,阻碍了有效的数据发现和重用。

目的

为应对这些挑战,我们提出一个综合框架,使用标准化格式、词汇表和协议使人口健康数据具有机器可读性,显著提高其FAIR性,并实现跨不同平台和研究应用程序的无缝发现、访问和集成。

方法

该框架实施三阶段方法。第一阶段是数据文档倡议(DDI)集成,这涉及利用DDI码本元数据和数据及相关资产详细信息的文档,同时确保透明度和全面性。第二阶段是观察性医疗成果伙伴关系(OMOP)通用数据模型(CDM)标准化。在此阶段,数据被协调并标准化为OMOP CDM,便于对异构数据集进行统一分析。第三阶段涉及Schema.org和用于链接数据的JavaScript对象表示法(JSON-LD)的集成,其中使用Schema.org实体生成机器可读元数据,并使用JSON-LD嵌入数据中,提高机器和人类用户的可发现性和理解能力。我们使用来自马拉维和肯尼亚的综合疾病监测与应对(IDSR)数据展示了这三个阶段的实施情况。

结果

我们框架的实施显著提高了人口健康数据的FAIR性,通过与谷歌数据集搜索等平台无缝集成,提高了可发现性。采用标准化格式和协议简化了跨各种研究环境的数据可访问性和集成,促进了合作和知识共享。此外,使用机器可解释的元数据使研究人员能够有效地将数据重新用于有针对性的分析和见解,从而使人口健康资源的整体价值最大化。JSON-LD代码可通过GitHub存储库获取,与JSON-LD集成的HTML代码可在研究实体人口信息共享实施网络网站上获取。

结论

采用机器可读元数据标准对于确保人口健康数据的FAIR性至关重要。通过采用这些标准,组织可以提高各种资源的可见性、可访问性和实用性,产生更广泛的影响,特别是在低收入和中等收入国家。机器可读元数据可以加速研究,改善医疗保健决策,并最终为全球人口促进更好的健康结果。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/13f8/11327634/30a22b33cc88/ojphi_v16i1e56237_fig1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验