Zhao Shilin, Jiang Limin, Yu Hui, Guo Yan
Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN.
Department Internal Medicine, University of New Mexico, Comprehensive Cancer Center, Albuquerque, NM.
J Genomics. 2022 Feb 14;10:39-44. doi: 10.7150/jgen.69860. eCollection 2022.
Genotyping array is the most economical approach for conducting large-scale genome-wide genetic association studies. Thorough quality control is key to generating high integrity genotyping data and robust results. Quality control of genotyping array is generally a complicated process, as it requires intensive manual labor in implementing the established protocols and curating a comprehensive quality report. There is an urgent need to reduce manual intervention via an automated quality control process. Based on previously established protocols and strategies, we developed an R package GTQC (GenoTyping Quality Control) to automate a majority of the quality control steps for general array genotyping data. GTQC covers a comprehensive spectrum of genotype data quality metrics and produces a detailed HTML report comprising tables and figures. Here, we describe the concepts underpinning GTQC and demonstrate its effectiveness using a real genotyping dataset. R package GTQC streamlines a majority of the quality control steps and produces a detailed HTML report on a plethora of quality control metrics, thus enabling a swift and rigorous data quality inspection prior to downstream GWAS and related analyses. By significantly cutting down on the time on genotyping quality control procedures, GTQC ensures maximum utilization of available resources and minimizes waste and inefficient allocation of manual efforts. GTQC tool can be accessed at https://github.com/slzhao/GTQC.
基因分型阵列是进行大规模全基因组遗传关联研究最经济的方法。全面的质量控制是生成高完整性基因分型数据和可靠结果的关键。基因分型阵列的质量控制通常是一个复杂的过程,因为在执行既定方案和整理全面的质量报告时需要大量的人工操作。迫切需要通过自动化质量控制流程来减少人工干预。基于先前建立的方案和策略,我们开发了一个R包GTQC(基因分型质量控制),以自动化一般阵列基因分型数据的大部分质量控制步骤。GTQC涵盖了广泛的基因型数据质量指标,并生成一份包含表格和图表的详细HTML报告。在这里,我们描述了GTQC的基本概念,并使用一个实际的基因分型数据集展示了它的有效性。R包GTQC简化了大部分质量控制步骤,并生成一份关于大量质量控制指标的详细HTML报告,从而能够在下游全基因组关联研究及相关分析之前进行快速而严格的数据质量检查。通过大幅减少基因分型质量控制程序的时间,GTQC确保了可用资源的最大利用,并最大限度地减少了人工精力的浪费和低效分配。可通过https://github.com/slzhao/GTQC访问GTQC工具。