IEEE/ACM Trans Comput Biol Bioinform. 2019 Jan-Feb;16(1):103-112. doi: 10.1109/TCBB.2018.2850037. Epub 2018 Jun 25.
Privacy is a major concern in sharing human subject data to researchers for secondary analyses. A simple binary consent (opt-in or not) may significantly reduce the amount of sharable data, since many patients might only be concerned about a few sensitive medical conditions rather than the entire medical records. We propose event-level privacy protection, and develop a feature ablation method to protect event-level privacy in electronic medical records. Using a list of 13 sensitive diagnoses, we evaluate the feasibility and the efficacy of the proposed method. As feature ablation progresses, the identifiability of a sensitive medical condition decreases with varying speeds on different diseases. We find that these sensitive diagnoses can be divided into three categories: (1) five diseases have fast declining identifiability (AUC below 0.6 with less than 400 features excluded); (2) seven diseases with progressively declining identifiability (AUC below 0.7 with between 200 and 700 features excluded); and (3) one disease with slowly declining identifiability (AUC above 0.7 with 1,000 features excluded). The fact that the majority (12 out of 13) of the sensitive diseases fall into the first two categories suggests the potential of the proposed feature ablation method as a solution for event-level record privacy protection.
在将人类受试者数据共享给研究人员进行二次分析时,隐私是一个主要关注点。简单的二元同意(加入或不加入)可能会显著减少可共享数据的数量,因为许多患者可能只关心少数敏感的医疗状况,而不是整个医疗记录。我们提出了事件级别的隐私保护,并开发了一种特征消除方法来保护电子病历中的事件级别的隐私。使用 13 个敏感诊断列表,我们评估了该方法的可行性和效果。随着特征消除的进行,不同疾病的敏感医疗状况的可识别性以不同的速度下降。我们发现这些敏感诊断可以分为三类:(1)五种疾病的可识别性迅速下降(排除 400 个特征后 AUC 低于 0.6);(2)七种疾病的可识别性逐渐下降(排除 200 到 700 个特征后 AUC 低于 0.7);(3)一种疾病的可识别性缓慢下降(排除 1000 个特征后 AUC 高于 0.7)。大多数(13 个中有 12 个)敏感疾病属于前两类,这表明所提出的特征消除方法作为事件级记录隐私保护的解决方案具有潜力。