Study Design and Data Analysis, College of Public Health, University of South Florida, Tampa, FL 33612, United States.
Department of Applied and Computational Mathematics and Statistics, University of Notre Dame, Notre Dame, IN 46556, United States.
J Am Med Inform Assoc. 2024 Apr 19;31(5):1135-1143. doi: 10.1093/jamia/ocae038.
Clinical trial data sharing is crucial for promoting transparency and collaborative efforts in medical research. Differential privacy (DP) is a formal statistical technique for anonymizing shared data that balances privacy of individual records and accuracy of replicated results through a "privacy budget" parameter, ε. DP is considered the state of the art in privacy-protected data publication and is underutilized in clinical trial data sharing. This study is focused on identifying ε values for the sharing of clinical trial data.
We analyzed 2 clinical trial datasets with privacy budget ε ranging from 0.01 to 10. Smaller values of ε entail adding greater amounts of random noise, with better privacy as a result. Comparison of rates, odds ratios, means, and mean differences between the original clinical trial datasets and the empirical distribution of the DP estimator was performed.
The DP rate closely approximated the original rate of 6.5% when ε > 1. The DP odds ratio closely aligned with the original odds ratio of 0.689 when ε ≥ 3. The DP mean closely approximated the original mean of 164.64 when ε ≥ 1. As ε increased to 5, both the minimum and maximum DP means converged toward the original mean.
There is no consensus on how to choose the privacy budget ε. The definition of DP does not specify the required level of privacy, and there is no established formula for determining ε.
Our findings suggest that the application of DP holds promise in the context of sharing clinical trial data.
临床试验数据共享对于促进医学研究的透明度和协作至关重要。差分隐私(DP)是一种通过“隐私预算”参数 ε 对共享数据进行匿名化的正式统计技术,该参数在平衡个体记录的隐私和复制结果的准确性方面发挥着作用。DP 被认为是隐私保护数据发布的最新技术,但在临床试验数据共享中并未得到充分利用。本研究旨在确定共享临床试验数据的 ε 值。
我们分析了两个隐私预算 ε 值范围为 0.01 至 10 的临床试验数据集。较小的 ε 值意味着需要添加更多的随机噪声,从而获得更好的隐私保护效果。对原始临床试验数据集和 DP 估计量的经验分布之间的比率、优势比、均值和均值差异进行了比较。
当 ε >1 时,DP 率与原始的 6.5%率非常接近。当 ε≥3 时,DP 优势比与原始的 0.689 优势比非常吻合。当 ε≥1 时,DP 均值与原始均值 164.64 非常接近。当 ε 增加到 5 时,DP 均值的最小值和最大值都趋近于原始均值。
目前尚无关于如何选择隐私预算 ε 的共识。DP 的定义并未指定所需的隐私级别,也没有确定 ε 的既定公式。
我们的研究结果表明,DP 在共享临床试验数据方面具有广阔的应用前景。