Inserm, Univ Brest, EFS, UMR 1078, GGB, 29200, Brest, France.
Institute of Genetics and Biophysics A. Buzzati-Traverso - CNR, Naples, Italy.
BMC Bioinformatics. 2022 Jun 24;23(1):254. doi: 10.1186/s12859-022-04795-8.
Estimating relatedness is an important step for many genetic study designs. A variety of methods for estimating coefficients of pairwise relatedness from genotype data have been proposed. Both the kinship coefficient [Formula: see text] and the fraternity coefficient [Formula: see text] for all pairs of individuals are of interest. However, when dealing with low-depth sequencing or imputation data, individual level genotypes cannot be confidently called. To ignore such uncertainty is known to result in biased estimates. Accordingly, methods have recently been developed to estimate kinship from uncertain genotypes.
We present new method-of-moment estimators of both the coefficients [Formula: see text] and [Formula: see text] calculated directly from genotype likelihoods. We have simulated low-depth genetic data for a sample of individuals with extensive relatedness by using the complex pedigree of the known genetic isolates of Cilento in South Italy. Through this simulation, we explore the behaviour of our estimators, demonstrate their properties, and show advantages over alternative methods. A demonstration of our method is given for a sample of 150 French individuals with down-sampled sequencing data.
We find that our method can provide accurate relatedness estimates whilst holding advantages over existing methods in terms of robustness, independence from external software, and required computation time. The method presented in this paper is referred to as LowKi (Low-depth Kinship) and has been made available in an R package ( https://github.com/genostats/LowKi ).
估计亲缘关系是许多遗传研究设计的重要步骤。已经提出了许多从基因型数据估计成对亲缘关系系数的方法。个体间的亲缘系数[公式:见文本]和兄弟系数[公式:见文本]都很重要。然而,在处理低深度测序或插补数据时,个体水平的基因型不能被自信地调用。忽略这种不确定性已知会导致有偏差的估计。因此,最近已经开发了从不确定的基因型估计亲缘关系的方法。
我们提出了新的方法,直接从基因型似然中计算出这两个系数[公式:见文本]和[公式:见文本]的矩估计量。我们使用意大利南部奇伦托已知遗传隔离体的复杂谱系,通过对具有广泛亲缘关系的个体的低深度遗传数据进行了模拟。通过这种模拟,我们探讨了我们的估计器的行为,证明了它们的性质,并展示了它们相对于替代方法的优势。我们还展示了我们的方法在 150 名法国个体的抽样测序数据中的应用。
我们发现,我们的方法可以提供准确的亲缘关系估计,同时在稳健性、对外部软件的独立性以及所需的计算时间方面优于现有方法。本文提出的方法被称为 LowKi(低深度亲缘关系),并已在 R 包中提供(https://github.com/genostats/LowKi)。