利用标记处基因含量的遗传力估计值对基因型进行质量控制。

Quality control of genotypes using heritability estimates of gene content at the marker.

作者信息

Forneris Natalia S, Legarra Andres, Vitezica Zulma G, Tsuruta Shogo, Aguilar Ignacio, Misztal Ignacy, Cantet Rodolfo J C

机构信息

Departamento de Producción Animal, Facultad de Agronomía, Universidad de Buenos Aires, C1417DSE Buenos Aires, Argentina Consejo Nacional de Investigaciones Científicas y Técnicas, Av. Rivadavia 1917, C1033AAJ Buenos Aires, Argentina.

INRA, Génétique, Physiologie et Systèmes d'Elevage (GenPhySE), F-31326 Castanet-Tolosan, France Université de Toulouse, INP, ENSAT, Génétique, Physiologie et Systèmes d'Elevage (GenPhySE), F-31326 Castanet-Tolosan, France

出版信息

Genetics. 2015 Mar;199(3):675-81. doi: 10.1534/genetics.114.173559. Epub 2015 Jan 6.

DOI:10.1534/genetics.114.173559

PMID:25567991

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4349063/

Abstract

Quality control filtering of single-nucleotide polymorphisms (SNPs) is a key step when analyzing genomic data. Here we present a practical method to identify low-quality SNPs, meaning markers whose genotypes are wrongly assigned for a large proportion of individuals, by estimating the heritability of gene content at each marker, where gene content is the number of copies of a particular reference allele in a genotype of an animal (0, 1, or 2). If there is no mutation at the marker, gene content has an additive heritability of 1 by construction. The method uses restricted maximum likelihood (REML) to estimate heritability of gene content at each SNP and also builds a likelihood-ratio test statistic to test for zero error variance in genotyping. As a by-product, estimates of the allele frequencies of markers at the base population are obtained. Using simulated data with 10% permutation error (4% actual error) in genotyping, the method had a specificity of 0.96 (4% of correct markers are rejected) and a sensitivity of 0.99 (1% of wrong markers are accepted) if markers with heritability lower than 0.975 are discarded. Checking of Mendelian errors resulted in a lower sensitivity (0.84) for the same simulation. The proposed method is further illustrated with a real data set with genotypes from 3534 animals genotyped for 50,433 markers from the Illumina PorcineSNP60 chip and a pedigree of 6473 individuals; those markers underwent very little quality control. A total of 4099 markers with P-values lower than 0.01 were discarded based on our method, with associated estimates of heritability as low as 0.12. Contrary to other techniques, our method uses all information in the population simultaneously, can be used in any population with markers and pedigree recordings, and is simple to implement using standard software for REML estimation. Scripts for its use are provided.

摘要

单核苷酸多态性（SNP）的质量控制筛选是分析基因组数据时的关键步骤。在此，我们提出一种实用方法来识别低质量SNP，即那些在很大比例个体中基因型被错误分配的标记，通过估计每个标记处基因含量的遗传力来实现，其中基因含量是动物基因型中特定参考等位基因的拷贝数（0、1或2）。如果标记处没有突变，基因含量通过构建具有加性遗传力1。该方法使用限制最大似然法（REML）来估计每个SNP处基因含量的遗传力，还构建了一个似然比检验统计量来检验基因分型中的零误差方差。作为副产品，可获得基础群体中标记等位基因频率的估计值。使用在基因分型中具有10%置换误差（4%实际误差）的模拟数据，如果丢弃遗传力低于0.975的标记，该方法的特异性为0.96（4%的正确标记被拒绝），灵敏度为0.99（1%的错误标记被接受）。对孟德尔误差的检查在相同模拟中导致较低的灵敏度（0.84）。用一个真实数据集进一步说明了所提出的方法，该数据集包含来自Illumina猪SNP60芯片的50433个标记且基因型为3534只动物的数据以及一个6473个个体的系谱；那些标记几乎没有经过质量控制。基于我们的方法，总共丢弃了4099个P值低于0.01的标记，其相关遗传力估计值低至0.12。与其他技术不同，我们的方法同时使用群体中的所有信息，可用于任何有标记和系谱记录的群体，并且使用用于REML估计的标准软件易于实现。提供了使用该方法的脚本。

相似文献

Quality control of genotypes using heritability estimates of gene content at the marker.利用标记处基因含量的遗传力估计值对基因型进行质量控制。

Genetics. 2015 Mar;199(3):675-81. doi: 10.1534/genetics.114.173559. Epub 2015 Jan 6.

Bias in heritability estimates from genomic restricted maximum likelihood methods under different genotyping strategies.不同基因分型策略下基因组限制最大似然法估计遗传力时的偏差

J Anim Breed Genet. 2019 Jan;136(1):40-50. doi: 10.1111/jbg.12367. Epub 2018 Nov 13.

Estimates of missing heritability for complex traits in Brown Swiss cattle.对瑞士褐牛复杂性状缺失遗传力的估计。

Genet Sel Evol. 2014 Jun 4;46(1):36. doi: 10.1186/1297-9686-46-36.

RAD-sequencing for estimating genomic relatedness matrix-based heritability in the wild: A case study in roe deer.RAD 测序估算基于基因组关系矩阵的野生种群遗传力：以狍为例的研究。

Mol Ecol Resour. 2019 Sep;19(5):1205-1217. doi: 10.1111/1755-0998.13031. Epub 2019 Jun 12.

Estimation of heritability with genomic information by method R.利用方法 R 从基因组信息估算遗传力。

J Anim Breed Genet. 2024 Sep;141(5):550-558. doi: 10.1111/jbg.12863. Epub 2024 Mar 25.

Efficient and accurate computation of base generation allele frequencies.高效准确地计算碱基生成等位基因频率。

J Dairy Sci. 2019 Feb;102(2):1364-1373. doi: 10.3168/jds.2018-15264. Epub 2018 Nov 22.

Accuracy of genotype imputation in Nelore cattle.内洛尔牛基因型填充的准确性。

Genet Sel Evol. 2014 Oct 10;46(1):69. doi: 10.1186/s12711-014-0069-1.

SNP-based heritability estimation using a Bayesian approach.基于 SNP 的贝叶斯方法遗传力估计。

Animal. 2013 Apr;7(4):531-9. doi: 10.1017/S1751731112002017. Epub 2012 Nov 23.

Allele frequency calibration for SNP based genotyping of DNA pools: A regression based local-global error fusion method.基于SNP的DNA池基因分型的等位基因频率校准：一种基于回归的局部-全局误差融合方法。

Comput Biol Med. 2015 Jun;61:48-55. doi: 10.1016/j.compbiomed.2015.03.020. Epub 2015 Mar 26.

Effect of genomic selection and genotyping strategy on estimation of variance components in animal models using different relationship matrices.基因组选择和基因型策略对使用不同关系矩阵的动物模型中方差分量估计的影响。

Genet Sel Evol. 2020 Jun 11;52(1):31. doi: 10.1186/s12711-020-00550-w.

引用本文的文献

Predicted breeding values for relative scrapie susceptibility for genotyped and ungenotyped sheep.基因分型和未基因分型绵羊相对痒病易感性的预测育种值。

Genet Sel Evol. 2024 Dec 18;56(1):77. doi: 10.1186/s12711-024-00947-x.

Interpreting single-step genomic evaluation as a neural network of three layers: pedigree, genotypes, and phenotypes.将单步基因组评估解释为三层神经网络：系谱、基因型和表型。

Genet Sel Evol. 2023 Oct 3;55(1):68. doi: 10.1186/s12711-023-00838-7.

Single-step genomic evaluation with metafounders for feed conversion ratio and average daily gain in Danish Landrace and Yorkshire pigs.丹麦长白猪和约克夏猪的饲料转化率和日增重的单步基因组评估与元发现者。

Genet Sel Evol. 2021 Oct 7;53(1):79. doi: 10.1186/s12711-021-00670-x.

Theoretical and empirical comparisons of expected and realized relationships for the X-chromosome.X 染色体预期和实际关系的理论和实证比较。

Genet Sel Evol. 2020 Aug 20;52(1):50. doi: 10.1186/s12711-020-00570-6.

Bias and accuracy of dairy sheep evaluations using BLUP and SSGBLUP with metafounders and unknown parent groups.使用包含元数据和未知父群的 BLUP 和 SSGBLUP 对奶绵羊进行评估的偏差和准确性。

Genet Sel Evol. 2020 Aug 12;52(1):47. doi: 10.1186/s12711-020-00567-1.

Genome-wide association study for feed efficiency in collective cage-raised rabbits under full and restricted feeding.全基因组关联研究在充分和限制饲养下群体笼养兔的饲料效率。

Anim Genet. 2020 Oct;51(5):799-810. doi: 10.1111/age.12988. Epub 2020 Jul 22.

Development of diagnostic SNP markers for quality assurance and control in sweetpotato [Ipomoea batatas (L.) Lam.] breeding programs.甘薯 [Ipomoea batatas (L.) Lam.] 育种计划中质量保证和控制的诊断 SNP 标记的开发。

PLoS One. 2020 Apr 24;15(4):e0232173. doi: 10.1371/journal.pone.0232173. eCollection 2020.

Estimation of indirect social genetic effects for skin lesion count in group-housed pigs by quantifying behavioral interactions1.通过量化群居猪的行为相互作用来估计皮肤损伤数量的间接社会遗传效应 1。

J Anim Sci. 2019 Sep 3;97(9):3658-3668. doi: 10.1093/jas/skz244.

Metafounders are related to F fixation indices and reduce bias in single-step genomic evaluations.元奠基者与F固定指数相关，并减少单步基因组评估中的偏差。

Genet Sel Evol. 2017 Mar 10;49(1):34. doi: 10.1186/s12711-017-0309-2.

Walking through the statistical black boxes of plant breeding.穿越植物育种的统计黑箱。

Theor Appl Genet. 2016 Oct;129(10):1933-49. doi: 10.1007/s00122-016-2750-y. Epub 2016 Jul 19.

本文引用的文献

VARIANCE OF GENE FREQUENCIES.基因频率的方差

Evolution. 1969 Mar;23(1):72-84. doi: 10.1111/j.1558-5646.1969.tb03496.x.

Differences between genomic-based and pedigree-based relationships in a chicken population, as a function of quality control and pedigree links among individuals.鸡群中基于基因组和基于系谱的亲缘关系之间的差异，作为个体间质量控制和系谱联系的函数。

J Anim Breed Genet. 2014 Dec;131(6):445-51. doi: 10.1111/jbg.12109. Epub 2014 Jul 15.

Detection of Mendelian consistent genotyping errors in pedigrees.家系中孟德尔一致基因分型错误的检测。

Genet Epidemiol. 2014 May;38(4):291-9. doi: 10.1002/gepi.21806. Epub 2014 Apr 9.

Accuracy of genomic prediction using an evenly spaced, low-density single nucleotide polymorphism panel in broiler chickens.利用均匀分布、低密度单核苷酸多态性panel 对肉鸡进行基因组预测的准确性。

Poult Sci. 2013 Jul;92(7):1712-23. doi: 10.3382/ps.2012-02941.

Confirmation and discovery of maternal grandsires and great-grandsires in dairy cattle.奶牛中外祖父母和曾祖父母的确认与发现。

J Dairy Sci. 2013 Mar;96(3):1874-9. doi: 10.3168/jds.2012-6176. Epub 2013 Jan 16.

A common dataset for genomic analysis of livestock populations.一个用于家畜群体基因组分析的常见数据集。

G3 (Bethesda). 2012 Apr;2(4):429-35. doi: 10.1534/g3.111.001453. Epub 2012 Apr 1.

A simple method to approximate gene content in large pedigree populations: application to the myostatin gene in dual-purpose Belgian Blue cattle.一种估计大群体家系中基因含量的简单方法：在双用途比利时蓝牛肌抑素基因中的应用。

Animal. 2007 Feb;1(1):21-8. doi: 10.1017/S1751731107392628.

Use of the Illumina Bovine3K BeadChip in dairy genomic evaluation.Illumina 牛 3K 珠芯片在奶牛基因组评估中的应用。

J Dairy Sci. 2012 Mar;95(3):1552-8. doi: 10.3168/jds.2011-4985.

A note on the rationale for estimating genealogical coancestry from molecular markers.关于从分子标记估计谱系同源性的基本原理的说明。

Genet Sel Evol. 2011 Jul 12;43(1):1-10. doi: 10.1186/1297-9686-43-27.

Efficient parentage assignment and pedigree reconstruction with dense single nucleotide polymorphism data.利用高密度单核苷酸多态性数据进行有效的亲权分析和家系重建。

J Dairy Sci. 2011 Apr;94(4):2114-7. doi: 10.3168/jds.2010-3896.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验