Department of Cancer Biology, Vanderbilt University, Nashville, TN, USA.
Department of Molecular Physics and Biology, Vanderbilt University, Nashville, TN, USA.
Brief Bioinform. 2018 Sep 28;19(5):765-775. doi: 10.1093/bib/bbx012.
Illumina genotyping arrays have powered thousands of large-scale genome-wide association studies over the past decade. Yet, because of the tremendous volume and complicated genetic assumptions of Illumina genotyping data, processing and quality control (QC) of these data remain a challenge. Thorough QC ensures the accurate identification of single-nucleotide polymorphisms and is required for the correct interpretation of genetic association results. By processing genotyping data on > 100 000 subjects from >10 major Illumina genotyping arrays, we have accumulated extensive experience in handling some of the most peculiar scenarios related to the processing and QC of Illumina genotyping data. Here, we describe strategies for processing Illumina genotyping data from the raw data to an analysis ready format, and we elaborate on the necessary QC procedures required at each processing step. High-quality Illumina genotyping data sets can be obtained by following our detailed QC strategies.
Illumina 基因分型阵列在过去十年中为数以千计的大规模全基因组关联研究提供了支持。然而,由于 Illumina 基因分型数据的巨大数量和复杂的遗传假设,这些数据的处理和质量控制 (QC) 仍然是一个挑战。彻底的 QC 可确保单核苷酸多态性的准确识别,并且是正确解释遗传关联结果所必需的。通过对来自 10 多个主要 Illumina 基因分型阵列的 >100,000 名受试者的基因分型数据进行处理,我们积累了处理与 Illumina 基因分型数据的处理和 QC 相关的一些最特殊情况的丰富经验。在这里,我们描述了从原始数据到可分析格式处理 Illumina 基因分型数据的策略,并详细说明了每个处理步骤所需的必要 QC 程序。通过遵循我们详细的 QC 策略,可以获得高质量的 Illumina 基因分型数据集。