Greco Brian, Luedtke Alexander, Hainline Allison, Alvarez Carolina, Beck Andrew, Tintle Nathan L
Department of Mathematics and Statistics, Grinnell College, 1115 8th Ave, Grinnell, IA 50112, USA.
Division of Biostatistics, UC Berkeley, 367 Evans Hall, Berkeley, CA 94720, USA.
BMC Proc. 2014 Jun 17;8(Suppl 1 Genetic Analysis Workshop 18Vanessa Olmo):S105. doi: 10.1186/1753-6561-8-S1-S105. eCollection 2014.
Pathway analysis approaches for sequence data typically either operate in a single stage (all variants within all genes in the pathway are combined into a single, very large set of variants that can then be analyzed using standard "gene-based" test statistics) or in 2-stages (gene-based p values are computed for all genes in the pathway, and then the gene-based p values are combined into a single pathway p value). To date, little consideration has been given to the performance of gene-based tests (typically designed for a smaller number of single-nucleotide variants [SNVs]) when the number of SNVs in the gene or in the pathway is very large and the genotypes come from sequence data organized in large pedigrees. We consider recently proposed gene-based tests for rare variants from complex pedigrees that test for association between a large set of SNVs and a qualitative phenotype of interest (1-stage analyses) as well as 2-stage approaches. We find that many of these methods show inflated type I errors when the number of SNVs in the gene or the pathway is large (>200 SNVs) and when using standard approaches to estimate the genotype covariance matrix. Alternative methods are needed when testing very large sets of SNVs in 1-stage approaches.
序列数据的通路分析方法通常要么在单阶段运行(通路中所有基因内的所有变异被组合成一个非常大的单一变异集,然后可以使用标准的“基于基因”检验统计量进行分析),要么在两阶段运行(为通路中的所有基因计算基于基因的p值,然后将基于基因的p值组合成一个单一的通路p值)。迄今为止,当基因或通路中的单核苷酸变异(SNV)数量非常大且基因型来自大型家系组织的序列数据时,很少有人考虑基于基因的检验(通常针对较少数量的单核苷酸变异设计)的性能。我们考虑了最近提出的针对复杂家系中罕见变异的基于基因的检验,这些检验用于检验一大组SNV与感兴趣的定性表型之间的关联(单阶段分析)以及两阶段方法。我们发现,当基因或通路中的SNV数量很大(>200个SNV)且使用标准方法估计基因型协方差矩阵时,这些方法中的许多都显示出膨胀的I型错误。在单阶段方法中检验非常大的SNV集时,需要替代方法。