Division of Biostatistics and Epidemiology, Department of Public Health, Amherst, MA, USA.
Acad Radiol. 2013 Jul;20(7):889-96. doi: 10.1016/j.acra.2013.04.011.
Grant funding institutions often require organizations to share their collected data as widely as possible while safeguarding the privacy of individuals. Summaries based on these data are often released. Here, the receiver operating characteristic (ROC) curve is explored for potential statistical disclosures in the presence of auxiliary data.
Formulas are introduced for calculating the missing data points from the full data set, given that a user has an empirical ROC curve and a subset of the data used to generate such a curve. Further, a discussion of the plausibility of this scenario is presented.
Diagnostic test data were simulated and an ROC curve was produced. Using a subset of the true data and the points on the empirical ROC curve, an attempt was made to reproduce the missing parts of the data. Disease statuses were able to be determined exactly, whereas test scores were solved for up to their rank.
If an individual or organization possessed the points of an empirical ROC curve and a subset of the true data, the true data underlying the ROC curve can be reproduced relatively accurately. As a result, the release of summaries of data, including the ROC curve, must be given careful thought before their release from a statistical disclosure perspective.
拨款机构通常要求组织尽可能广泛地共享其收集的数据,同时保护个人隐私。通常会发布基于这些数据的摘要。在这里,探讨了在存在辅助数据的情况下,接收器工作特性(ROC)曲线可能存在的统计披露。
介绍了一种从完整数据集计算缺失数据点的公式,前提是用户具有经验 ROC 曲线和用于生成该曲线的数据集的子集。此外,还讨论了这种情况的合理性。
模拟了诊断测试数据并生成了 ROC 曲线。使用真实数据的子集和经验 ROC 曲线上的点,尝试再现数据的缺失部分。能够准确确定疾病状态,而测试分数则最多可以解决其等级。
如果个人或组织拥有经验 ROC 曲线的点和真实数据的子集,则可以相对准确地再现 ROC 曲线下的真实数据。因此,从统计披露的角度来看,在发布包括 ROC 曲线在内的数据摘要之前,必须仔细考虑。