Sealock Julia M, Ivankovic Franjo, Liao Calwing, Chen Siwei, Churchhouse Claire, Karczewski Konrad J, Howrigan Daniel P, Neale Benjamin M
Analytic and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital, Boston, MA, USA.
Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
Nat Protoc. 2025 Mar 28. doi: 10.1038/s41596-025-01169-1.
Genetic sequencing technologies are powerful tools for identifying rare variants and genes associated with Mendelian and complex traits; indeed, whole-exome and whole-genome sequencing are increasingly popular methods for population-scale genetic studies. However, careful quality control steps should be taken to ensure study accuracy and reproducibility, and sequencing data require extensive quality filtering to delineate true variants from technical artifacts. Although processing standards are harmonized across pipelines to call variants from sequencing reads, there currently exists no standardized pipeline for conducting quality filtering on variant-level datasets for the purpose of population-scale association analysis. In this Tutorial, we discuss key quality control parameters, provide guidelines for conducting quality filtering of samples and variants, and compare commonly used software programs for quality control of samples, variants and genotypes from sequencing data. As sequencing data continue to gain popularity in genetic research, establishing standardized quality control practices is crucial to ensure consistent, reliable and reproducible results across studies.
基因测序技术是识别与孟德尔性状和复杂性状相关的罕见变异和基因的强大工具;事实上,全外显子组测序和全基因组测序在群体规模的基因研究中越来越受欢迎。然而,应采取谨慎的质量控制步骤以确保研究的准确性和可重复性,并且测序数据需要进行广泛的质量过滤,以从技术假象中辨别出真正的变异。尽管在各个流程中处理标准是统一的,以便从测序读数中调用变异,但目前还不存在用于在群体规模关联分析中对变异水平数据集进行质量过滤的标准化流程。在本教程中,我们讨论关键的质量控制参数,提供对样本和变异进行质量过滤的指导方针,并比较用于对测序数据中的样本、变异和基因型进行质量控制的常用软件程序。随着测序数据在基因研究中越来越受欢迎,建立标准化的质量控制实践对于确保各项研究结果的一致性、可靠性和可重复性至关重要。