Zhou Hua, Zhou Jin, Hu Tao, Sobel Eric M, Lange Kenneth
Department of Biostatistics, University of California, Los Angeles, CA 90095 USA.
Division of Epidemiology and Biostatistics, Mel and Enid Zuckerman College of Public Health, Tucson, AZ 85721-0066 USA.
BMC Proc. 2016 Oct 18;10(Suppl 7):239-244. doi: 10.1186/s12919-016-0037-6. eCollection 2016.
Pedigree genome-wide association studies (GWAS) (Option 29) in the current version of the Mendel software is an optimized subroutine for performing large-scale genome-wide quantitative trait locus (QTL) analysis. This analysis (a) works for random sample data, pedigree data, or a mix of both; (b) is highly efficient in both run time and memory requirement; (c) accommodates both univariate and multivariate traits; (d) works for autosomal and x-linked loci; (e) correctly deals with missing data in traits, covariates, and genotypes; (f) allows for covariate adjustment and constraints among parameters; (g) uses either theoretical or single nucleotide polymorphism (SNP)-based empirical kinship matrix for additive polygenic effects; (h) allows extra variance components such as dominant polygenic effects and household effects; (i) detects and reports outlier individuals and pedigrees; and (j) allows for robust estimation via the t-distribution. This paper assesses these capabilities on the genetics analysis workshop 19 (GAW19) sequencing data. We analyzed simulated and real phenotypes for both family and random sample data sets. For instance, when jointly testing the 8 longitudinally measured systolic blood pressure and diastolic blood pressure traits, it takes Mendel 78 min on a standard laptop computer to read, quality check, and analyze a data set with 849 individuals and 8.3 million SNPs. Genome-wide expression QTL analysis of 20,643 expression traits on 641 individuals with 8.3 million SNPs takes 30 h using 20 parallel runs on a cluster. Mendel is freely available at http://www.genetics.ucla.edu/software.
当前版本的Mendel软件中的系谱全基因组关联研究(GWAS)(选项29)是用于进行大规模全基因组数量性状位点(QTL)分析的优化子程序。该分析(a)适用于随机样本数据、系谱数据或两者的混合数据;(b)在运行时间和内存需求方面都非常高效;(c)适用于单变量和多变量性状;(d)适用于常染色体和X连锁位点;(e)能正确处理性状、协变量和基因型中的缺失数据;(f)允许进行协变量调整和参数间的约束;(g)使用理论或基于单核苷酸多态性(SNP)的经验亲缘关系矩阵来分析加性多基因效应;(h)允许存在额外的方差成分,如显性多基因效应和家庭效应;(i)检测并报告异常个体和家系;(j)允许通过t分布进行稳健估计。本文在遗传分析研讨会19(GAW19)测序数据上评估了这些功能。我们分析了家系和随机样本数据集的模拟和真实表型。例如,在联合测试8个纵向测量的收缩压和舒张压性状时,在一台标准笔记本电脑上,Mendel读取、质量检查并分析一个包含849个个体和830万个SNP的数据集需要78分钟。在一个集群上使用20个并行运行,对641个个体的20643个表达性状进行全基因组表达QTL分析,使用830万个SNP需要30小时。Mendel可在http://www.genetics.ucla.edu/software免费获取。