Saad Mohamed N, Mabrouk Mai S, Eldeib Ayman M, Shaker Olfat G
Biomedical Engineering Department, Faculty of Engineering, Minia University, Minia, Egypt.
Biomedical Engineering Department, Faculty of Engineering, Misr University for Science and Technology, 6th of October City, Egypt.
J Adv Res. 2019 Jan 18;18:113-126. doi: 10.1016/j.jare.2019.01.006. eCollection 2019 Jul.
The human genome, which includes thousands of genes, represents a big data challenge. Rheumatoid arthritis (RA) is a complex autoimmune disease with a genetic basis. Many single-nucleotide polymorphism (SNP) association methods partition a genome into haplotype blocks. The aim of this genome wide association study (GWAS) was to select the most appropriate haplotype block partitioning method for the North American Rheumatoid Arthritis Consortium (NARAC) dataset. The methods used for the NARAC dataset were the individual SNP approach and the following haplotype block methods: the four-gamete test (FGT), confidence interval test (CIT), and solid spine of linkage disequilibrium (SSLD). The measured parameters that reflect the strength of the association between the biomarker and RA were the -value after Bonferroni correction and other parameters used to compare the output of each haplotype block method. This work presents a comparison among the individual SNP approach and the three haplotype block methods to select the method that can detect all the significant SNPs when applied alone. The GWAS results from the NARAC dataset obtained with the different methods are presented. The individual SNP, CIT, FGT, and SSLD methods detected 541, 1516, 1551, and 1831 RA-associated SNPs respectively, and the individual SNP, FGT, CIT, and SSLD methods detected 65, 156, 159, and 450 significant SNPs respectively, that were not detected by the other methods. Three hundred eighty-three SNPs were discovered by the haplotype block methods and the individual SNP approach, while 1021 SNPs were discovered by all three haplotype block methods. The 383 SNPs detected by all the methods are promising candidates for studying RA susceptibility. A hybrid technique involving all four methods should be applied to detect the significant SNPs associated with RA in the NARAC dataset, but the SSLD method may be preferred because of its advantages when only one method was used.
包含数千个基因的人类基因组带来了大数据方面的挑战。类风湿性关节炎(RA)是一种具有遗传基础的复杂自身免疫性疾病。许多单核苷酸多态性(SNP)关联方法将基因组划分为单倍型块。这项全基因组关联研究(GWAS)的目的是为北美类风湿性关节炎联盟(NARAC)数据集选择最合适的单倍型块划分方法。用于NARAC数据集的方法有个体SNP方法以及以下单倍型块方法:四配子检验(FGT)、置信区间检验(CIT)和连锁不平衡的坚实脊柱(SSLD)。反映生物标志物与RA之间关联强度的测量参数是经Bonferroni校正后的P值以及用于比较每种单倍型块方法输出结果的其他参数。这项工作对个体SNP方法和三种单倍型块方法进行了比较,以选择单独应用时能够检测出所有显著SNP的方法。展示了使用不同方法从NARAC数据集获得的GWAS结果。个体SNP、CIT、FGT和SSLD方法分别检测到541个、1516个、1551个和1831个与RA相关的SNP,并且个体SNP、FGT、CIT和SSLD方法分别检测到65个、156个、159个和450个其他方法未检测到的显著SNP。单倍型块方法和个体SNP方法共同发现了383个SNP,而所有三种单倍型块方法共同发现了1021个SNP。所有方法都检测到的这383个SNP是研究RA易感性的有前景的候选对象。应该应用一种涉及所有四种方法的混合技术来检测NARAC数据集中与RA相关的显著SNP,但由于仅使用一种方法时SSLD方法具有优势,所以它可能是更优选的方法。