Igo Robert P, Cooke Bailey Jessica N, Romm Jane, Haines Jonathan L, Wiggs Janey L
Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, Ohio.
Center for Inherited Disease Research, Johns Hopkins University, Baltimore, Maryland.
Curr Protoc Hum Genet. 2016 Jul 1;90:2.14.1-2.14.16. doi: 10.1002/cphg.15.
The Illumina HumanExome BeadChip and other exome-based genotyping arrays offer inexpensive genotyping of some 240,000 mostly nonsynonymous coding variants across the human genome. The HumanExome chip, with its highly non-uniform distribution of markers and emphasis on rare coding variants, presents some unique challenges for quality control (QC) and data cleaning. Here, we describe QC procedures for HumanExome data, with examples of challenges specific to exome arrays from our experience cleaning a data set of ∼7,500 samples from the NEIGHBORHOOD Consortium. We focus on standard procedures for QC of genome-wide array data including genotype calling, sex verification, sample identity verification, relationship checking, and population structure that are complicated by the HumanExome panel's enrichment in rare, exonic variation. © 2016 by John Wiley & Sons, Inc.
Illumina人类外显子芯片及其他基于外显子的基因分型阵列能够对人类基因组中约24万个大多为非同义编码变异进行低成本的基因分型。人类外显子芯片的标记分布极不均匀,且侧重于罕见编码变异,这给质量控制(QC)和数据清理带来了一些独特的挑战。在此,我们描述了人类外显子数据的质量控制程序,并结合我们清理邻里联盟约7500个样本数据集时所遇到的外显子阵列特有挑战的实例进行说明。我们重点关注全基因组阵列数据质量控制的标准程序,包括基因型分型、性别验证、样本身份验证、亲缘关系检查以及群体结构分析,而人类外显子芯片组中丰富的罕见外显子变异使这些程序变得更加复杂。© 2016约翰威立父子公司版权所有