Suppr超能文献

一种支持在线数据协调的安全且可重复使用的软件架构。

A Secure and Reusable Software Architecture for Supporting Online Data Harmonization.

作者信息

Feric Zlatan, Bohm Agostini Nicolas, Beene Daniel, Signes-Pastor Antonio J, Halchenko Yuliya, Watkins Deborah, MacKenzie Debra, Karagas Margaret, Manjourides Justin, Alshawabkeh Akram, Kaeli David

机构信息

Dept. of Electrical and Computer Engineering, Northeastern University.

Community Environmental Health Program, College of Pharmacy, Health Sciences Center, University of New Mexico.

出版信息

Proc IEEE Int Conf Big Data. 2021 Dec;2021:2801-2812. doi: 10.1109/bigdata52589.2021.9671538.

Abstract

Retrospective data harmonization across multiple research cohorts and studies is frequently done to increase statistical power, provide comparison analysis, and create a richer data source for data mining. However, when combining disparate data sources, harmonization projects face data management and analysis challenges. These include differences in the data dictionaries and variable definitions, privacy concerns surrounding health data representing sensitive populations, and lack of properly defined data models. With the availability of mature open-source web-based database technologies, developing a complete software architecture to overcome the challenges associated with the harmonization process can alleviate many roadblocks. By leveraging state-of-the-art software engineering and database principles, we can ensure data quality and enable cross-center online access and collaboration. This paper outlines a complete software architecture developed and customized using the Django web framework, leveraged to harmonize sensitive data collected from three NIH-support birth cohorts. We describe our framework and show how we successfully overcame challenges faced when harmonizing data from these cohorts. We discuss our efforts in data cleaning, data sharing, data transformation, data visualization, and analytics, while reflecting on what we have learned to date from these harmonized datasets.

摘要

跨多个研究队列和研究进行回顾性数据协调,通常是为了提高统计效力、提供比较分析,并为数据挖掘创建更丰富的数据源。然而,在合并不同的数据源时,协调项目面临数据管理和分析方面的挑战。这些挑战包括数据字典和变量定义的差异、围绕代表敏感人群的健康数据的隐私问题,以及缺乏定义恰当的数据模型。随着成熟的基于网络的开源数据库技术的出现,开发一个完整的软件架构来克服与协调过程相关的挑战,可以消除许多障碍。通过利用最先进的软件工程和数据库原则,我们可以确保数据质量,并实现跨中心的在线访问与协作。本文概述了一个使用Django网络框架开发和定制的完整软件架构,该架构用于协调从三个由美国国立卫生研究院支持的出生队列收集的敏感数据。我们描述了我们的框架,并展示了我们如何成功克服在协调这些队列数据时所面临的挑战。我们讨论了我们在数据清理、数据共享、数据转换、数据可视化和分析方面所做的努力,同时反思我们迄今从这些协调数据集中学到的东西。

相似文献

2
A review of harmonization methods for studying dietary patterns.饮食模式研究的协调方法综述
Smart Health (Amst). 2022 Mar;23. doi: 10.1016/j.smhl.2021.100263. Epub 2022 Jan 13.
5
Data Integration for Future Medicine (DIFUTURE).未来医学数据集成(DIFUTURE)
Methods Inf Med. 2018 Jul;57(S 01):e57-e65. doi: 10.3414/ME17-02-0022. Epub 2018 Jul 17.

本文引用的文献

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验