Department of Pediatrics, Hospital for Special Surgery.
Appl Clin Inform. 2012 Oct 24;3(4):392-403. doi: 10.4338/ACI-2012-07-RA-0028. Print 2012.
The reuse of clinical data for research purposes requires methods for the protection of personal privacy. One general approach is the removal of personal identifiers from the data. A frequent part of this anonymization process is the removal of times and dates, which we refer to as "chrononymization." While this step can make the association with identified data (such as public information or a small sample of patient information) more difficult, it comes at a cost to the usefulness of the data for research.
We sought to determine whether removal of dates from common laboratory test panels offers any advantage in protecting such data from re-identification.
We obtained a set of results for 5.9 million laboratory panels from the National Institutes of Health's (NIH) Biomedical Translational Research Information System (BTRIS), selected a random set of 20,000 panels from the larger source sets, and then identified all matches between the sets.
We found that while removal of dates could hinder the re-identification of a single test result, such removal had almost no effect when entire panels were used.
Our results suggest that reliance on chrononymization provides a false sense of security for the protection of laboratory test results. As a result of this study, the NIH has chosen to rely on policy solutions, such as strong data use agreements, rather than removal of dates when reusing clinical data for research purposes.
出于研究目的而重复使用临床数据需要采取措施保护个人隐私。一种常用方法是从数据中删除个人标识符。该匿名化过程的一个常见步骤是删除时间和日期,我们称之为“年代化名”。虽然这一步骤可以使数据与已识别数据(例如公共信息或一小部分患者信息)的关联更加困难,但这会降低数据对研究的可用性。
我们旨在确定从常见实验室检测面板中删除日期是否可以提供任何优势,从而保护这些数据不被重新识别。
我们从美国国立卫生研究院(NIH)的生物医学转化研究信息系统(BTRIS)中获取了一组 590 万份实验室面板的结果,从更大的源集中随机选择了 20,000 份面板,然后识别了两组之间的所有匹配项。
我们发现,虽然删除日期可能会阻碍单个测试结果的重新识别,但在使用整个面板时,这种删除几乎没有影响。
我们的结果表明,依赖年代化名会给实验室检测结果的保护带来一种虚假的安全感。基于这项研究,NIH 选择依赖政策解决方案,例如签订强有力的数据使用协议,而不是在出于研究目的重复使用临床数据时删除日期。