Blumhagen Rachel Z, Schwartz David A, Langefeld Carl D, Fingerlin Tasha E
Center for Genes, Environment and Health, National Jewish Health, Denver, Colorado, USA,
Department of Biostatistics and Informatics, Colorado School of Public Health, Aurora, Colorado, USA,
Hum Hered. 2021 Feb 10:1-13. doi: 10.1159/000513290.
Studies that examine the role of rare variants in both simple and complex disease are increasingly common. Though the usual approach of testing rare variants in aggregate sets is more powerful than testing individual variants, it is of interest to identify the variants that are plausible drivers of the association. We present a novel method for prioritization of rare variants after a significant aggregate test by quantifying the influence of the variant on the aggregate test of association.
In addition to providing a measure used to rank variants, we use outlier detection methods to present the computationally efficient Rare Variant Influential Filtering Tool (RIFT) to identify a subset of variants that influence the disease association. We evaluated several outlier detection methods that vary based on the underlying variance measure: interquartile range (Tukey fences), median absolute deviation, and SD. We performed 1,000 simulations for 50 regions of size 3 kb and compared the true and false positive rates. We compared RIFT using the Inner Tukey to 2 existing methods: adaptive combination of p values (ADA) and a Bayesian hierarchical model (BeviMed). Finally, we applied this method to data from our targeted resequencing study in idiopathic pulmonary fibrosis (IPF).
All outlier detection methods observed higher sensitivity to detect uncommon variants (0.001 < minor allele frequency, MAF > 0.03) compared to very rare variants (MAF <0.001). For uncommon variants, RIFT had a lower median false positive rate compared to the ADA. ADA and RIFT had significantly higher true positive rates than that observed for BeviMed. When applied to 2 regions found previously associated with IPF including 100 rare variants, we identified 6 polymorphisms with the greatest evidence for influencing the association with IPF.
In summary, RIFT has a high true positive rate while maintaining a low false positive rate for identifying polymorphisms influencing rare variant association tests. This work provides an approach to obtain greater resolution of the rare variant signals within significant aggregate sets; this information can provide an objective measure to prioritize variants for follow-up experimental studies and insight into the biological pathways involved.
研究罕见变异在单基因病和复杂疾病中的作用的研究日益普遍。尽管在集合中对罕见变异进行检测的常规方法比检测单个变异更具效力,但确定那些可能是关联驱动因素的变异仍很有意义。我们提出了一种新方法,通过量化变异对关联总体检验的影响,在显著的总体检验后对罕见变异进行优先级排序。
除了提供一种用于对变异进行排名的度量外,我们还使用离群值检测方法,提出了计算效率高的罕见变异影响过滤工具(RIFT),以识别影响疾病关联的变异子集。我们评估了几种基于潜在方差度量而不同的离群值检测方法:四分位距(Tukey界限)、中位数绝对偏差和标准差。我们对50个大小为3 kb的区域进行了1000次模拟,并比较了真阳性率和假阳性率。我们将使用内Tukey方法的RIFT与2种现有方法进行了比较:p值的自适应组合(ADA)和贝叶斯层次模型(BeviMed)。最后,我们将此方法应用于我们针对特发性肺纤维化(IPF)的靶向重测序研究的数据。
与极罕见变异(次要等位基因频率,MAF<0.001)相比,所有离群值检测方法在检测不常见变异(0.001
总之,RIFT在识别影响罕见变异关联检验的多态性时具有较高的真阳性率,同时保持较低的假阳性率。这项工作提供了一种方法,可在显著的总体集合中获得更高分辨率的罕见变异信号;这些信息可为后续实验研究中对变异进行优先级排序提供客观度量,并深入了解所涉及的生物学途径。