Cadman Tim, Slofstra Mariska K, van der Geest Marije A, Avraam Demetris, Bishop Tom R P, de Boer Tommy, Duijts Liesbeth, Haakma Sido, Hyde Eleanor, Jaddoe Vincent, Karramass Tarik, Kelpin Fleur, Marcon Yannick, Pinot de Moira Angela, Postma Dick, Tolboom Clemens, Veenstra Ruben L, Wheater Stuart, Welten Marieke, Wilson Rebecca C, Zwart Erik, Swertz Morris
Department of Genetics, Genomics Coordination Center, University Medical Center Groningen, University of Groningen, Groningen, 9700 RB, The Netherlands.
Department of Public Health, University of Copenhagen, Copenhagen, 1353, Denmark.
Bioinformatics. 2024 Dec 26;41(1). doi: 10.1093/bioinformatics/btae726.
Extensive human health data from cohort studies, national registries, and biobanks can reveal lifecourse risk factors impacting health. Combining these sources offers increased statistical power, rare outcome detection, replication of findings, and extended study periods. Traditionally, this required data transfer to a central location or separate partner analyses with pooled summary statistics, posing ethical, legal, and time constraints. Federated analysis-which involves remote data analysis without sharing individual-level data-is a promising alternative. One promising solution is DataSHIELD (https://datashield.org/), an open-source R based implementation. To enable federated analysis, data owners need a user-friendly way to install the federated infrastructure and manage users and data. Here, we present MOLGENIS Armadillo: a lightweight server for federated analysis solutions such as DataSHIELD.
Armadillo is implemented as a collection of three packages freely available under the open source licence LGPLv3: two R packages downloadable from the Comprehensive R Archive Network (CRAN) ("MolgenisArmadillo" and "DSMolgenisArmdillo") and one Java application ("ArmadilloService") as jar and docker images via Github (https://github.com/molgenis/molgenis-service-armadillo).
来自队列研究、国家登记处和生物样本库的大量人类健康数据可以揭示影响健康的生命历程风险因素。整合这些数据来源可提高统计功效、发现罕见结果、重复研究结果并延长研究周期。传统上,这需要将数据传输到中心位置或通过汇总统计数据进行单独的合作分析,这带来了伦理、法律和时间限制。联合分析(即无需共享个体层面数据的远程数据分析)是一种很有前景的替代方法。一个很有前景的解决方案是DataSHIELD(https://datashield.org/),它是基于R语言的开源实现。为了实现联合分析,数据所有者需要一种用户友好的方式来安装联合基础设施并管理用户和数据。在此,我们介绍MOLGENIS Armadillo:一种用于DataSHIELD等联合分析解决方案的轻量级服务器。
Armadillo是作为三个在开源许可LGPLv3下免费提供的软件包集合来实现的:两个可从综合R存档网络(CRAN)下载的R软件包(“MolgenisArmadillo”和“DSMolgenisArmdillo”)以及一个通过Github(https://github.com/molgenis/molgenis-service-armadillo)以jar和docker镜像形式提供的Java应用程序(“ArmadilloService”)。