Suppr超能文献

国家肉牛和奶牛系统中的单核苷酸多态性(SNP)数据质量控制以及基于SNP的高度准确的亲子关系验证和识别

SNP Data Quality Control in a National Beef and Dairy Cattle System and Highly Accurate SNP Based Parentage Verification and Identification.

作者信息

McClure Matthew C, McCarthy John, Flynn Paul, McClure Jennifer C, Dair Emma, O'Connell D K, Kearney John F

机构信息

Irish Cattle Breeding Federation, Cork, Ireland.

Weatherbys Ireland, Kildare, Ireland.

出版信息

Front Genet. 2018 Mar 15;9:84. doi: 10.3389/fgene.2018.00084. eCollection 2018.

Abstract

A major use of genetic data is parentage verification and identification as inaccurate pedigrees negatively affect genetic gain. Since 2012 the international standard for single nucleotide polymorphism (SNP) verification in cattle has been the ISAG SNP panels. While these ISAG panels provide an increased level of parentage accuracy over microsatellite markers (MS), they can validate the wrong parent at ≤1% misconcordance rate levels, indicating that more SNP are needed if a more accurate pedigree is required. With rapidly increasing numbers of cattle being genotyped in Ireland that represent 61 breeds from a wide range of farm types: beef/dairy, AI/pedigree/commercial, purebred/crossbred, and large to small herd size the Irish Cattle Breeding Federation (ICBF) analyzed different SNP densities to determine that at a minimum ≥500 SNP are needed to consistently predict only one set of parents at a ≤1% misconcordance rate. For parentage validation and prediction ICBF uses 800 SNP (ICBF800) selected based on SNP clustering quality, ISAG200 inclusion, call rate (CR), and minor allele frequency (MAF) in the Irish cattle population. Large datasets require sample and SNP quality control (QC). Most publications only deal with SNP QC via CR, MAF, parent-progeny conflicts, and Hardy-Weinberg deviation, but not sample QC. We report here parentage, SNP QC, and a genomic sample QC pipelines to deal with the unique challenges of >1 million genotypes from a national herd such as SNP genotype errors from mis-tagging of animals, lab errors, farm errors, and multiple other issues that can arise. We divide the pipeline into two parts: a Genotype QC and an Animal QC pipeline. The Genotype QC identifies samples with low call rate, missing or mixed genotype classes (no BB genotype or ABTG alleles present), and low genotype frequencies. The Animal QC handles situations where the genotype might not belong to the listed individual by identifying: >1 non-matching genotypes per animal, SNP duplicates, sex and breed prediction mismatches, parentage and progeny validation results, and other situations. The Animal QC pipeline make use of ICBF800 SNP set where appropriate to identify errors in a computationally efficient yet still highly accurate method.

摘要

遗传数据的一个主要用途是亲权验证和鉴定,因为不准确的谱系会对遗传增益产生负面影响。自2012年以来,牛单核苷酸多态性(SNP)验证的国际标准一直是国际动物遗传学会(ISAG)SNP面板。虽然这些ISAG面板在亲权准确性方面比微卫星标记(MS)有所提高,但在错配率≤1%的水平下,它们可能会验证错误的亲本,这表明如果需要更准确的谱系,则需要更多的SNP。在爱尔兰,代表来自各种农场类型(肉牛/奶牛、人工授精/纯种/商业、纯种/杂交以及大到小不同畜群规模)的61个品种的牛进行基因分型的数量迅速增加,爱尔兰牛育种联合会(ICBF)分析了不同的SNP密度,以确定至少需要≥500个SNP才能在错配率≤1%的情况下始终只预测一组亲本。为了进行亲权验证和预测,ICBF使用基于SNP聚类质量、ISAG200纳入情况、检出率(CR)和爱尔兰牛群中的次要等位基因频率(MAF)选择的800个SNP(ICBF800)。大型数据集需要样本和SNP质量控制(QC)。大多数出版物仅通过CR、MAF、亲子冲突和哈迪-温伯格偏差来处理SNP QC,而不涉及样本QC。我们在此报告亲权、SNP QC和基因组样本QC流程,以应对来自全国牛群的超过100万个基因型的独特挑战,例如动物标记错误、实验室错误、农场错误以及可能出现的许多其他问题导致的SNP基因型错误。我们将该流程分为两部分:基因型QC和动物QC流程。基因型QC识别检出率低、基因型类别缺失或混合(不存在BB基因型或ABTG等位基因)以及基因型频率低的样本。动物QC通过识别以下情况来处理基因型可能不属于所列个体的情况:每只动物有>1个不匹配的基因型、SNP重复、性别和品种预测不匹配、亲权和后代验证结果以及其他情况。动物QC流程在适当的时候利用ICBF800 SNP集,以一种计算效率高但仍然高度准确的方法识别错误。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eae5/5862794/8037c1be62db/fgene-09-00084-g0001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验