Computer Science and AI Lab, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA.
Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA.
Genome Res. 2023 Jul;33(7):1101-1112. doi: 10.1101/gr.277699.123. Epub 2023 Aug 4.
Gene expression data provide molecular insights into the functional impact of genetic variation, for example, through expression quantitative trait loci (eQTLs). With an improving understanding of the association between genotypes and gene expression comes a greater concern that gene expression profiles could be matched to genotype profiles of the same individuals in another data set, known as a linking attack. Prior works show such a risk could analyze only a fraction of eQTLs that is independent owing to restrictive model assumptions, leaving the full extent of this risk incompletely understood. To address this challenge, we introduce the discriminative sequence model (DSM), a novel probabilistic framework for predicting a sequence of genotypes based on gene expression data. By modeling the joint distribution over all known eQTLs in a genomic region, DSM improves the power of linking attacks with necessary calibration for linkage disequilibrium and redundant predictive signals. We show greater linking accuracy of DSM compared with existing approaches across a range of attack scenarios and data sets including up to 22,288 individuals, suggesting that DSM helps uncover a substantial additional risk overlooked by previous studies. Our work provides a unified framework for assessing the privacy risks of sharing diverse omics data sets beyond transcriptomics.
基因表达数据为研究遗传变异的功能影响提供了分子见解,例如通过表达数量性状基因座 (eQTL)。随着对基因型和基因表达之间关联的理解不断提高,人们越来越担心基因表达谱可以与另一个数据集(称为链接攻击)中同一个体的基因型谱相匹配。先前的研究表明,由于模型假设的限制,这种风险只能分析一小部分独立的 eQTL,因此对这种风险的全面了解仍不完整。为了解决这个挑战,我们引入了判别序列模型 (DSM),这是一种用于根据基因表达数据预测基因型序列的新型概率框架。通过对基因组区域中所有已知 eQTL 的联合分布进行建模,DSM 提高了链接攻击的能力,并对连锁不平衡和冗余预测信号进行了必要的校准。我们在一系列攻击场景和数据集(包括多达 22,288 个人)中展示了 DSM 与现有方法相比更高的链接准确性,这表明 DSM 有助于揭示先前研究忽略的大量额外风险。我们的工作为评估超越转录组学的多种组学数据集共享的隐私风险提供了一个统一的框架。