Boerner Vinzent
Animal Genetics and Breeding Unit, University of New England, Armidale, 2351, Australia.
Genet Sel Evol. 2017 Jun 15;49(1):50. doi: 10.1186/s12711-017-0324-3.
Parentage verification by molecular markers is mainly based on short tandem repeat markers. Single nucleotide polymorphisms (SNPs) as bi-allelic markers have become the markers of choice for genotyping projects. Thus, the subsequent step is to use SNP genotypes for parentage verification as well. Recent developments of algorithms such as evaluating opposing homozygous SNP genotypes have drawbacks, for example the inability of rejecting all animals of a sample of potential parents. This paper describes an algorithm for parentage verification by constrained regression which overcomes the latter limitation and proves to be very fast and accurate even when the number of SNPs is as low as 50. The algorithm was tested on a sample of 14,816 animals with 50, 100 and 500 SNP genotypes randomly selected from 40k genotypes. The samples of putative parents of these animals contained either five random animals, or four random animals and the true sire. Parentage assignment was performed by ranking of regression coefficients, or by setting a minimum threshold for regression coefficients. The assignment quality was evaluated by the power of assignment (P[Formula: see text]) and the power of exclusion (P[Formula: see text]).
If the sample of putative parents contained the true sire and parentage was assigned by coefficient ranking, P[Formula: see text] and P[Formula: see text] were both higher than 0.99 for the 500 and 100 SNP genotypes, and higher than 0.98 for the 50 SNP genotypes. When parentage was assigned by a coefficient threshold, P[Formula: see text] was higher than 0.99 regardless of the number of SNPs, but P[Formula: see text] decreased from 0.99 (500 SNPs) to 0.97 (100 SNPs) and 0.92 (50 SNPs). If the sample of putative parents did not contain the true sire and parentage was rejected using a coefficient threshold, the algorithm achieved a P[Formula: see text] of 1 (500 SNPs), 0.99 (100 SNPs) and 0.97 (50 SNPs).
The algorithm described here is easy to implement, fast and accurate, and is able to assign parentage using genomic marker data with a size as low as 50 SNPs.
通过分子标记进行亲权鉴定主要基于短串联重复标记。单核苷酸多态性(SNP)作为双等位基因标记已成为基因分型项目的首选标记。因此,后续步骤是也使用SNP基因型进行亲权鉴定。诸如评估对立纯合SNP基因型等算法的最新进展存在缺陷,例如无法排除潜在亲本样本中的所有动物。本文描述了一种通过约束回归进行亲权鉴定的算法,该算法克服了后一个限制,并且即使在SNP数量低至50个时也被证明非常快速和准确。该算法在从40k基因型中随机选择的具有50、100和500个SNP基因型的14,816只动物样本上进行了测试。这些动物的推定亲本样本包含五只随机动物,或四只随机动物和真正的父本。通过回归系数排名或通过设置回归系数的最小阈值来进行亲权分配。通过分配能力(P[公式:见正文])和排除能力(P[公式:见正文])来评估分配质量。
如果推定亲本样本包含真正的父本并且通过系数排名进行亲权分配,对于500和100个SNP基因型,P[公式:见正文]和P[公式:见正文]均高于0.99,对于50个SNP基因型则高于0.98。当通过系数阈值进行亲权分配时,无论SNP数量如何,P[公式:见正文]均高于0.99,但P[公式:见正文]从0.99(500个SNP)降至0.97(100个SNP)和0.92(50个SNP)。如果推定亲本样本不包含真正的父本并且使用系数阈值拒绝亲权,则该算法的P[公式:见正文]为1(500个SNP)、0.99(100个SNP)和0.97(50个SNP)。
本文所述算法易于实施,快速且准确,并且能够使用低至50个SNP的基因组标记数据进行亲权分配。