Suppr超能文献

常见变异和罕见变异的质量控制

Quality Control of Common and Rare Variants.

作者信息

Panoutsopoulou Kalliope, Walter Klaudia

机构信息

Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA, Cambridgeshire, United Kingdom.

出版信息

Methods Mol Biol. 2018;1793:25-36. doi: 10.1007/978-1-4939-7868-7_3.

Abstract

Thorough data quality control (QC) is a key step to the success of high-throughput genotyping approaches. Following extensive research several criteria and thresholds have been established for data QC at the sample and variant level. Sample QC is aimed at the identification and removal (when appropriate) of individuals with (1) low call rate, (2) discrepant sex or other identity-related information, (3) excess genome-wide heterozygosity and homozygosity, (4) relations to other samples, (5) ethnicity differences, (6) batch effects, and (7) contamination. Variant QC is aimed at identification and removal or refinement of variants with (1) low call rate, (2) call rate differences by phenotypic status, (3) gross deviation from Hardy-Weinberg Equilibrium (HWE), (4) bad genotype intensity plots, (5) batch effects, (6) differences in allele frequencies with published data sets, (7) very low minor allele counts (MAC), (8) low imputation quality score, (9) low variant quality score log-odds, and (10) few or low quality reads.

摘要

全面的数据质量控制(QC)是高通量基因分型方法成功的关键步骤。经过广泛研究,已在样本和变异水平上为数据质量控制建立了若干标准和阈值。样本质量控制旨在识别并(在适当情况下)去除具有以下情况的个体:(1)低检出率;(2)性别不符或其他与身份相关的信息不符;(3)全基因组杂合度和纯合度过高;(4)与其他样本的亲缘关系;(5)种族差异;(6)批次效应;(7)污染。变异质量控制旨在识别并去除或优化具有以下情况的变异:(1)低检出率;(2)按表型状态的检出率差异;(3)严重偏离哈迪-温伯格平衡(HWE);(4)不良基因型强度图;(5)批次效应;(6)与已发表数据集的等位基因频率差异;(7)极低的次要等位基因计数(MAC);(8)低填充质量分数;(9)低变异质量分数对数优势;(10)读数少或质量低。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验