Sun Ting-Hsuan, Shao Yu-Hsuan Joni, Mao Chien-Lin, Hung Miao-Neng, Lo Yi-Yun, Ko Tai-Ming, Hsiao Tzu-Hung
Department of Biological Science and Technology, National Yang Ming Chiao Tung University, Hsinchu, Taiwan.
Department of Medical Research, Taichung Veterans General Hospital, Taichung, Taiwan.
Front Genet. 2021 Oct 26;12:736390. doi: 10.3389/fgene.2021.736390. eCollection 2021.
Single-nucleotide polymorphism (SNP) arrays are an ideal technology for genotyping genetic variants in mass screening. However, using SNP arrays to detect rare variants [with a minor allele frequency (MAF) of <1%] is still a challenge because of noise signals and batch effects. An approach that improves the genotyping quality is needed for clinical applications. We developed a quality-control procedure for rare variants which integrates different algorithms, filters, and experiments to increase the accuracy of variant calling. Using data from the TWB 2.0 custom Axiom array, we adopted an advanced normalization adjustment to prevent false calls caused by splitting the cluster and a rare het adjustment which decreases false calls in rare variants. The concordance of allelic frequencies from array data was compared to those from sequencing datasets of Taiwanese. Finally, genotyping results were used to detect familial hypercholesterolemia (FH), thrombophilia (TH), and maturity-onset diabetes of the young (MODY) to assess the performance in disease screening. All heterozygous calls were verified by Sanger sequencing or qPCR. The positive predictive value (PPV) of each step was estimated to evaluate the performance of our procedure. We analyzed SNP array data from 43,433 individuals, which interrogated 267,247 rare variants. The advanced normalization and rare het adjustment methods adjusted genotyping calling of 168,134 variants (96.49%). We further removed 3916 probesets which were discordant in MAFs between the SNP array and sequencing data. The PPV for detecting pathogenic variants with 0.01%<MAF≤1% exceeded 99.37%. PPVs for those with an MAF of ≤0.01% improved from 95% to 100% for FH, 42.11% to 85.19% for TH, and 18.24% to 72.22% for MODY after adopting our rare variant quality-control procedure and experimental verification. Adopting our quality-control procedure, SNP arrays can adequately detect variants with MAF values ranging 0.01%∼0.1%. For variants with MAF values of ≤0.01%, experimental validation is needed unless sequencing data from a homogeneous population of >10,000 are available. The results demonstrated our procedure could perform correct genotype calling of rare variants. It provides a solution of pathogenic variant detection through SNP array. The approach brings tremendous promise for implementing precision medicine in medical practice.
单核苷酸多态性(SNP)阵列是大规模筛查中对遗传变异进行基因分型的理想技术。然而,由于噪声信号和批次效应,使用SNP阵列检测罕见变异(次要等位基因频率(MAF)<1%)仍然是一项挑战。临床应用需要一种提高基因分型质量的方法。我们开发了一种针对罕见变异的质量控制程序,该程序整合了不同的算法、过滤器和实验,以提高变异检测的准确性。利用来自TWB 2.0定制Axiom阵列的数据,我们采用了先进的标准化调整来防止因簇分裂导致的错误检测,并采用了罕见杂合子调整来减少罕见变异中的错误检测。将阵列数据的等位基因频率一致性与台湾人群测序数据集的等位基因频率一致性进行比较。最后,利用基因分型结果检测家族性高胆固醇血症(FH)、血栓形成倾向(TH)和青年发病的成年型糖尿病(MODY),以评估疾病筛查中的性能。所有杂合子检测结果均通过桑格测序或定量PCR进行验证。估计每个步骤的阳性预测值(PPV)以评估我们程序的性能。我们分析了来自43433名个体的SNP阵列数据,检测了267247个罕见变异。先进的标准化和罕见杂合子调整方法调整了168134个变异(96.49%)的基因分型检测。我们进一步去除了3916个在SNP阵列和测序数据之间MAF不一致的探针集。检测MAF为0.01%<MAF≤1%的致病变异的PPV超过99.37%。在采用我们的罕见变异质量控制程序和实验验证后,MAF≤0.01%的变异的PPV在FH中从95%提高到100%,在TH中从42.11%提高到85.19%,在MODY中从18.24%提高到72.22%。采用我们的质量控制程序,SNP阵列可以充分检测MAF值在0.01%至0.1%之间的变异。对于MAF值≤0.01%的变异,除非有来自超过10000个同质人群的测序数据,否则需要进行实验验证。结果表明我们的程序可以对罕见变异进行正确的基因分型检测。它提供了一种通过SNP阵列检测致病变异的解决方案。该方法为在医疗实践中实施精准医学带来了巨大希望。