Li Sujun, Bandeira Nuno, Wang Xiaofeng, Tang Haixu
School of Informatics and Computing, Indiana University, Bloomington, IN, USA.
Department of Computer Science and Engineering, University of California, San Diego, CA, USA.
AMIA Jt Summits Transl Sci Proc. 2016 Aug 31;2016:122-31. eCollection 2016.
Although the privacy issues in human genomic studies are well known, the privacy risks in clinical proteomic data have not been thoroughly studied. As a proof of concept, we reported a comprehensive analysis of the privacy risks in clinical proteomic data. It showed that a small number of peptides carrying the minor alleles (referred to as the minor allelic peptides) at non-synonymous single nucleotide polymorphism (nsSNP) sites can be identified in typical clinical proteomic datasets acquired from the blood/serum samples of individual patient, from which the patient can be identified with high confidence. Our results suggested the presence of significant privacy risks in raw clinical proteomic data. However, these risks can be mitigated by a straightforward pre-processing step of the raw data that removing a very small fraction (0.1%, 7.14 out of 7,504 spectra on average) of MS/MS spectra identified as the minor allelic peptides, which has little or no impact on the subsequent analysis (and re-use) of these datasets.
虽然人类基因组研究中的隐私问题广为人知,但临床蛋白质组数据中的隐私风险尚未得到充分研究。作为概念验证,我们报告了对临床蛋白质组数据隐私风险的全面分析。结果表明,在从个体患者的血液/血清样本获取的典型临床蛋白质组数据集中,可以识别出少数在非同义单核苷酸多态性(nsSNP)位点携带次要等位基因的肽段(称为次要等位基因肽段),据此能够高度准确地识别出患者。我们的结果表明原始临床蛋白质组数据存在重大隐私风险。然而,通过对原始数据进行一个简单的预处理步骤,即去除被鉴定为次要等位基因肽段的极小部分(0.1%,平均7504个质谱/质谱图谱中有7.14个),这些风险可以得到缓解,而这对这些数据集的后续分析(和再利用)几乎没有影响。