Her Qoua, Malenfant Jessica, Zhang Zilu, Vilk Yury, Young Jessica, Tabano David, Hamilton Jack, Johnson Ron, Raebel Marsha, Boudreau Denise, Toh Sengwee
Harvard Medical School, Harvard Pilgrim Health Care Institute, Boston, MA, United States.
Institute for Health Research, Kaiser Permanente Colorado, Denver, CO, United States.
JMIR Med Inform. 2020 Jun 4;8(6):e15073. doi: 10.2196/15073.
A distributed data network approach combined with distributed regression analysis (DRA) can reduce the risk of disclosing sensitive individual and institutional information in multicenter studies. However, software that facilitates large-scale and efficient implementation of DRA is limited.
This study aimed to assess the precision and operational performance of a DRA application comprising a SAS-based DRA package and a file transfer workflow developed within the open-source distributed networking software PopMedNet in a horizontally partitioned distributed data network.
We executed the SAS-based DRA package to perform distributed linear, logistic, and Cox proportional hazards regression analysis on a real-world test case with 3 data partners. We used PopMedNet to iteratively and automatically transfer highly summarized information between the data partners and the analysis center. We compared the DRA results with the results from standard SAS procedures executed on the pooled individual-level dataset to evaluate the precision of the SAS-based DRA package. We computed the execution time of each step in the workflow to evaluate the operational performance of the PopMedNet-driven file transfer workflow.
All DRA results were precise (<10), and DRA model fit curves were identical or similar to those obtained from the corresponding pooled individual-level data analyses. All regression models required less than 20 min for full end-to-end execution.
We integrated a SAS-based DRA package with PopMedNet and successfully tested the new capability within an active distributed data network. The study demonstrated the validity and feasibility of using DRA to enable more privacy-protecting analysis in multicenter studies.
分布式数据网络方法与分布式回归分析(DRA)相结合,可以降低在多中心研究中泄露敏感个人和机构信息的风险。然而,便于大规模高效实施DRA的软件有限。
本研究旨在评估一个DRA应用程序的精度和操作性能,该应用程序包括一个基于SAS的DRA包和在开源分布式网络软件PopMedNet中开发的文件传输工作流程,用于水平分区的分布式数据网络。
我们执行基于SAS的DRA包,对一个有3个数据合作伙伴的实际测试案例进行分布式线性、逻辑和Cox比例风险回归分析。我们使用PopMedNet在数据合作伙伴和分析中心之间迭代并自动传输高度汇总的信息。我们将DRA结果与在合并的个体水平数据集上执行的标准SAS程序的结果进行比较,以评估基于SAS的DRA包的精度。我们计算工作流程中每个步骤的执行时间,以评估PopMedNet驱动的文件传输工作流程的操作性能。
所有DRA结果都很精确(<10),DRA模型拟合曲线与从相应的合并个体水平数据分析中获得的曲线相同或相似。所有回归模型的完整端到端执行时间均不到20分钟。
我们将基于SAS的DRA包与PopMedNet集成,并在一个活跃的分布式数据网络中成功测试了这一新功能。该研究证明了在多中心研究中使用DRA进行更多隐私保护分析的有效性和可行性。