Li Weili, Dobbins Sara, Tomlinson Ian, Houlston Richard, Pal Deb K, Strug Lisa J
Division of Biostatistics, Dalla Lana School of Public Health, University of Toronto, Toronto, Ont., Canada.
Hum Hered. 2015;79(1):5-13. doi: 10.1159/000371579. Epub 2015 Feb 3.
Prioritizing individual rare variants within associated genes or regions often consists of an ad hoc combination of statistical and biological considerations. From the statistical perspective, rare variants are often ranked using Fisher's exact p values, which can lead to different rankings of the same set of variants depending on whether 1- or 2-sided p values are used.
We propose a likelihood ratio-based measure, maxLRc, for the statistical component of ranking rare variants under a case-control study design that avoids the hypothesis-testing paradigm. We prove analytically that the maxLRc is always well-defined, even when the data has zero cell counts in the 2×2 disease-variant table. Via simulation, we show that the maxLRc outperforms Fisher's exact p values in most practical scenarios considered. Using next-generation sequence data from 27 rolandic epilepsy cases and 200 controls in a region previously shown to be linked to and associated with rolandic epilepsy, we demonstrate that rankings assigned by the maxLRc and exact p values can differ substantially.
The maxLRc provides reliable statistical prioritization of rare variants using only the observed data, avoiding the need to specify parameters associated with hypothesis testing that can result in ranking discrepancies across p value procedures; and it is applicable to common variant prioritization.
在相关基因或区域内对个体罕见变异进行优先级排序通常由统计和生物学考量的临时组合构成。从统计学角度来看,罕见变异通常使用费舍尔精确p值进行排序,这可能会导致同一组变异根据使用单侧还是双侧p值而产生不同的排序。
我们提出了一种基于似然比的度量方法maxLRc,用于在病例对照研究设计下对罕见变异进行排序的统计部分,该方法避免了假设检验范式。我们通过分析证明,即使在2×2疾病-变异表中的数据单元格计数为零时,maxLRc也始终定义明确。通过模拟,我们表明在大多数考虑的实际场景中,maxLRc优于费舍尔精确p值。使用来自27例罗兰多癫痫病例和200例对照的下一代测序数据,在先前显示与罗兰多癫痫相关联的一个区域中,我们证明了maxLRc和精确p值给出的排序可能存在显著差异。
maxLRc仅使用观测数据就为罕见变异提供了可靠的统计优先级排序,避免了指定与假设检验相关的参数,这些参数可能导致不同p值程序之间的排序差异;并且它适用于常见变异的优先级排序。