Sarrat-González David, Escribà-Montagut Xavier, Houghtaling Jared, González Juan R
Barcelona Insitute for Global Health (ISGlobal), Avda Dr Aiguader, 88, Barcelona, Spain.
Tufts University School of Medicine, 145 Harrison Avenue, Boston, MA 02111, USA.
Bioinformatics. 2025 May 6. doi: 10.1093/bioinformatics/btaf286.
Collaborative clinical research projects face several challenges related to data sharing. The disparity between data standards and strict privacy regulations become more relevant as the number of involved institutions increases. To address these challenges, the scientific community has progressively adopted common data models like the OMOP CDM for multicenter data standardization and implemented federated data analysis platforms like DataSHIELD to perform remote analyses without transferring individual-level data between centers, thus mitigating disclosure risks. However, there is no native implementation that automatically combines both solutions, revealing the need for a tool that enables interoperability between these systems.
We present dsOMOP, a collection of DataSHIELD packages that facilitates automated extraction and transformation of OMOP CDM data into DataSHIELD-compatible datasets, enabling disclosure-controlled federated analyses of standardized clinical data. dsOMOP allows research institutions to provide access to their data for collaborative projects in a format that is interoperable with the project's available data, thus facilitating the analysis of large-scale, multicenter clinical data. It incorporates OMOP data directly into the DataSHIELD workflow, where all analyses occur entirely in a federated environment subject to rigorous disclosure controls, ensuring that only aggregated, non-disclosive results are ever returned to analysts.
The general information page for the dsOMOP environment is available at https://isglobal-brge.github.io/dsOMOP, where the most recent installation instructions and usage guides for all dsOMOP packages and their extensions can be found in the 'Packages' section.The dsOMOP package and its complementary tools are fully available under the MIT license on GitHub: dsOMOP (https://github.com/isglobal-brge/dsOMOP), dsOMOPClient (https://github.com/isglobal-brge/dsOMOPClient), dsOMOPHelper (https://github.com/isglobal-brge/dsOMOPHelper), and dsOMOP.oracle (https://github.com/isglobal-brge/dsOMOP.oracle).Usage vignettes for the client-side packages are available at the websites of dsOMOPClient (https://isglobal-brge.github.io/dsOMOPClient) and dsOMOPHelper (https://isglobal-brge.github.io/dsOMOPHelper). A permanent archival snapshot of the exact code used in this manuscript is deposited at Figshare: https://doi.org/10.6084/m9.figshare.28607186.
dsOMOP allows automated, federated analyses of OMOP CDM-compliant data within the DataSHIELD environment, enabling federated analyses of standardized clinical data while safeguarding patient privacy through controlled, aggregated outputs. This tool supports large-scale, multicenter research by facilitating data interoperability and aligning with privacy regulations.
Supplementary data are available at Bioinformatics online.
协作式临床研究项目在数据共享方面面临诸多挑战。随着参与机构数量的增加,数据标准与严格的隐私法规之间的差异变得愈发突出。为应对这些挑战,科学界逐渐采用了诸如OMOP通用数据模型(OMOP CDM)之类的通用数据模型来实现多中心数据标准化,并实施了诸如DataSHIELD之类的联邦数据分析平台,以便在不跨中心传输个体层面数据的情况下进行远程分析,从而降低数据泄露风险。然而,目前尚无能够自动整合这两种解决方案的原生实现方式,这表明需要一种工具来实现这些系统之间的互操作性。
我们展示了dsOMOP,这是一组DataSHIELD软件包,可促进将OMOP CDM数据自动提取并转换为与DataSHIELD兼容的数据集,从而实现对标准化临床数据的可控联邦分析。dsOMOP允许研究机构以与项目现有数据可互操作的格式为协作项目提供数据访问权限,从而便于对大规模多中心临床数据进行分析。它将OMOP数据直接纳入DataSHIELD工作流程,所有分析均完全在严格的数据披露控制的联邦环境中进行,确保仅向分析人员返回汇总的、不泄露信息的结果。
dsOMOP允许在DataSHIELD环境中对符合OMOP CDM的数据进行自动化的联邦分析,在通过可控的汇总输出保护患者隐私的同时,实现对标准化临床数据的联邦分析。该工具通过促进数据互操作性并符合隐私法规,支持大规模多中心研究。
补充数据可在《生物信息学》在线版获取。