Department of Computer Science, Applied Mathematics and Statistics, Universitat de Girona, Girona, Spain.
Genomes For Life - GCAT lab, Institute for Health Science Research Germans Trias i Pujol (IGTP), Can Ruti Campus, Badalona, Barcelona, Spain.
Heredity (Edinb). 2021 Mar;126(3):537-547. doi: 10.1038/s41437-020-00392-8. Epub 2021 Jan 15.
The detection of family relationships in genetic databases is of interest in various scientific disciplines such as genetic epidemiology, population and conservation genetics, forensic science, and genealogical research. Nowadays, screening genetic databases for related individuals forms an important aspect of standard quality control procedures. Relatedness research is usually based on an allele sharing analysis of identity by state (IBS) or identity by descent (IBD) alleles. Existing IBS/IBD methods mainly aim to identify first-degree relationships (parent-offspring or full siblings) and second degree (half-siblings, avuncular, or grandparent-grandchild) pairs. Little attention has been paid to the detection of in-between first and second-degree relationships such as three-quarter siblings (3/4S) who share fewer alleles than first-degree relationships but more alleles than second-degree relationships. With the progressively increasing sample sizes used in genetic research, it becomes more likely that such relationships are present in the database under study. In this paper, we extend existing likelihood ratio (LR) methodology to accurately infer the existence of 3/4S, distinguishing them from full siblings and second-degree relatives. We use bootstrap confidence intervals to express uncertainty in the LRs. Our proposal accounts for linkage disequilibrium (LD) by using marker pruning, and we validate our methodology with a pedigree-based simulation study accounting for both LD and recombination. An empirical genome-wide array data set from the GCAT Genomes for Life cohort project is used to illustrate the method.
在遗传数据库中检测亲属关系在多个科学领域都具有重要意义,如遗传流行病学、群体和保护遗传学、法医学以及系谱研究。如今,在遗传数据库中筛查相关个体已成为标准质量控制程序的重要组成部分。亲属关系研究通常基于对状态相同性(IBS)或血统相同性(IBD)等位基因的等位基因共享分析。现有的 IBS/IBD 方法主要旨在识别一级亲属关系(父母与子女或全同胞)和二级亲属关系(半同胞、叔伯/姑姨与侄子/侄女或祖孙)。对于介于一级和二级亲属关系之间的关系,如共享比一级亲属关系少但比二级亲属关系多的等位基因的四分之三同胞(3/4S)的检测,关注较少。随着遗传研究中使用的样本量逐渐增加,在研究数据库中存在此类关系的可能性更大。在本文中,我们扩展了现有的似然比(LR)方法,以准确推断 3/4S 的存在,将其与全同胞和二级亲属区分开来。我们使用自举置信区间来表示 LR 中的不确定性。我们的方法通过使用标记修剪来考虑连锁不平衡(LD),并通过考虑 LD 和重组的基于系谱的模拟研究来验证我们的方法。使用来自 GCAT Genomes for Life 队列项目的全基因组阵列数据来举例说明该方法。