Bonomi Luca, Lionts Marilyn, Fan Liyue
Dept. Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN.
Dept. Computer Science, Vanderbilt University, Nashville, TN.
Proc IEEE Int Conf Big Data. 2023 Dec;2023:5444-5453. doi: 10.1109/BigData59044.2023.10386571.
Effective disease surveillance systems require large-scale epidemiological data to improve health outcomes and quality of care for the general population. As data may be limited within a single site, multi-site data (e.g., from a number of local/regional health systems) need to be considered. Leveraging distributed data across multiple sites for epidemiological analysis poses significant challenges. Due to the sensitive nature of epidemiological data, it is imperative to design distributed solutions that provide strong privacy protections. Current privacy solutions often assume a central site, which is responsible for aggregating the distributed data and applying privacy protection before sharing the results (e.g., aggregation via secure primitives and differential privacy for sharing aggregate results). However, identifying such a central site may be difficult in practice and relying on a central site may introduce potential vulnerabilities (e.g., single point of failure). Furthermore, to support clinical interventions and inform policy decisions in a timely manner, epidemiological analysis need to reflect dynamic changes in the data. Yet, existing distributed privacy-protecting approaches were largely designed for static data (e.g., one-time data sharing) and cannot fulfill dynamic data requirements. In this work, we propose a privacy-protecting approach that supports the sharing of dynamic epidemiological analysis and provides strong privacy protection in a decentralized manner. We apply our solution in continuous survival analysis using the Kaplan-Meier estimation model while providing differential privacy protection. Our evaluations on a real dataset containing COVID-19 cases show that our method provides highly usable results.
有效的疾病监测系统需要大规模的流行病学数据,以改善普通人群的健康状况和医疗服务质量。由于单个地点的数据可能有限,因此需要考虑多地点数据(例如,来自多个地方/区域卫生系统的数据)。利用多个地点的分布式数据进行流行病学分析面临重大挑战。由于流行病学数据的敏感性,设计提供强大隐私保护的分布式解决方案势在必行。当前的隐私解决方案通常假定有一个中心站点,该站点负责汇总分布式数据并在共享结果之前应用隐私保护(例如,通过安全原语进行汇总并使用差分隐私来共享汇总结果)。然而,在实践中确定这样一个中心站点可能很困难,而且依赖中心站点可能会引入潜在漏洞(例如,单点故障)。此外,为了及时支持临床干预并为政策决策提供依据,流行病学分析需要反映数据的动态变化。然而,现有的分布式隐私保护方法主要是为静态数据(例如,一次性数据共享)设计的,无法满足动态数据的需求。在这项工作中,我们提出了一种隐私保护方法,该方法支持动态流行病学分析的共享,并以分散的方式提供强大的隐私保护。我们将我们的解决方案应用于使用Kaplan-Meier估计模型的连续生存分析,同时提供差分隐私保护。我们对包含COVID-19病例的真实数据集的评估表明,我们的方法提供了高度可用的结果。