Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA; New York Genome Center, New York, NY 10013, USA.
Department of Computer Science, Brown University, Providence, RI 02912, USA.
Cell. 2024 Nov 14;187(23):6537-6549.e10. doi: 10.1016/j.cell.2024.09.012. Epub 2024 Oct 2.
The increase in publicly available human single-cell datasets, encompassing millions of cells from many donors, has significantly enhanced our understanding of complex biological processes. However, the accessibility of these datasets raises significant privacy concerns. Due to the inherent noise in single-cell measurements and the scarcity of population-scale single-cell datasets, recent private information quantification studies have focused on bulk gene expression data sharing. To address this gap, we demonstrate that individuals in single-cell gene expression datasets are vulnerable to linking attacks, where attackers can infer their sensitive phenotypic information using publicly available tissue or cell-type-specific expression quantitative trait loci (eQTLs) information. We further develop a method for genotype prediction and genotype-phenotype linking that remains effective without relying on eQTL information. We show that variants from one study can be exploited to uncover private information about individuals in another study.
公开可用的人类单细胞数据集的增加,涵盖了来自许多供体的数百万个细胞,极大地提高了我们对复杂生物过程的理解。然而,这些数据集的可访问性引发了重大的隐私问题。由于单细胞测量中的固有噪声和群体规模单细胞数据集的稀缺性,最近的私人信息量化研究集中在批量基因表达数据共享上。为了解决这一差距,我们证明单细胞基因表达数据集中的个体容易受到链接攻击,攻击者可以使用公开的组织或细胞类型特异性表达数量性状基因座 (eQTL) 信息来推断他们的敏感表型信息。我们进一步开发了一种基因型预测和基因型-表型链接的方法,无需依赖 eQTL 信息即可保持有效性。我们表明,一项研究中的变体可被利用来揭示另一项研究中个体的私人信息。