Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA.
Department of Mathematics, Hanyang University, Seoul, 04763. Republic of Korea.
Brief Bioinform. 2022 Nov 19;23(6). doi: 10.1093/bib/bbac473.
Estimation of genetic relatedness, or kinship, is used occasionally for recreational purposes and in forensic applications. While numerous methods were developed to estimate kinship, they suffer from high computational requirements and often make an untenable assumption of homogeneous population ancestry of the samples. Moreover, genetic privacy is generally overlooked in the usage of kinship estimation methods. There can be ethical concerns about finding unknown familial relationships in third-party databases. Similar ethical concerns may arise while estimating and reporting sensitive population-level statistics such as inbreeding coefficients for the concerns around marginalization and stigmatization.
Here, we present SIGFRIED, which makes use of existing reference panels with a projection-based approach that simplifies kinship estimation in the admixed populations. We use simulated and real datasets to demonstrate the accuracy and efficiency of kinship estimation. We present a secure federated kinship estimation framework and implement a secure kinship estimator using homomorphic encryption-based primitives for computing relatedness between samples in two different sites while genotype data are kept confidential. Source code and documentation for our methods can be found at https://doi.org/10.5281/zenodo.7053352.
Analysis of relatedness is fundamentally important for identifying relatives, in association studies, and for estimation of population-level estimates of inbreeding. As the awareness of individual and group genomic privacy is growing, privacy-preserving methods for the estimation of relatedness are needed. Presented methods alleviate the ethical and privacy concerns in the analysis of relatedness in admixed, historically isolated and underrepresented populations.
Genetic relatedness is a central quantity used for finding relatives in databases, correcting biases in genome wide association studies and for estimating population-level statistics. Methods for estimating genetic relatedness have high computational requirements, and occasionally do not consider individuals from admixed ancestries. Furthermore, the ethical concerns around using genetic data and calculating relatedness are not considered. We present a projection-based approach that can efficiently and accurately estimate kinship. We implement our method using encryption-based techniques that provide provable security guarantees to protect genetic data while kinship statistics are computed among multiple sites.
遗传相关性(或亲缘关系)的估计偶尔用于娱乐目的和法医学应用。虽然已经开发了许多方法来估计亲缘关系,但它们存在计算要求高的问题,并且通常对样本的同质群体祖先做出不可持续的假设。此外,在使用亲缘关系估计方法时,通常会忽略遗传隐私。在第三方数据库中发现未知的家族关系可能会引起伦理问题。在估计和报告敏感的群体水平统计数据(例如,近亲繁殖系数)时,也可能会出现类似的伦理问题,因为这涉及边缘化和污名化的问题。
在这里,我们提出了 SIGFRIED,它利用现有的参考面板和基于投影的方法,简化了混合人群中的亲缘关系估计。我们使用模拟和真实数据集来演示亲缘关系估计的准确性和效率。我们提出了一个安全的联合亲缘关系估计框架,并使用基于同态加密的原语实现了一个安全的亲缘关系估计器,用于在两个不同站点的样本之间计算相关性,同时保持基因型数据的机密性。我们的方法的源代码和文档可以在 https://doi.org/10.5281/zenodo.7053352 找到。
分析相关性对于识别亲属、关联研究以及估计群体水平的近亲繁殖程度非常重要。随着个体和群体基因组隐私意识的增强,需要使用隐私保护方法来估计相关性。本文提出的方法缓解了在混合、历史上孤立和代表性不足的人群中分析相关性时的伦理和隐私问题。
遗传相关性是在数据库中寻找亲属、校正全基因组关联研究中的偏差以及估计群体水平统计数据的核心数量。估计遗传相关性的方法计算要求高,并且偶尔不考虑来自混合祖先的个体。此外,使用遗传数据和计算相关性的伦理问题也没有得到考虑。我们提出了一种基于投影的方法,可以有效地、准确地估计亲缘关系。我们使用基于加密的技术来实现我们的方法,该技术提供了可证明的安全保证,以保护遗传数据,同时在多个站点之间计算亲缘关系统计数据。