Suppr超能文献

比较基于混合模型的方法,用于校正群体亚结构,并将其应用于极端表型抽样。

Comparison of mixed model based approaches for correcting for population substructure with application to extreme phenotype sampling.

机构信息

Department of Mathematics and Statistics, University of Ottawa, Ottawa, Canada.

School of Epidemiology and Public Health, University of Ottawa, Ottawa, Canada.

出版信息

BMC Genomics. 2022 Feb 4;23(1):98. doi: 10.1186/s12864-022-08297-y.

Abstract

BACKGROUND

Mixed models are used to correct for confounding due to population stratification and hidden relatedness in genome-wide association studies. This class of models includes linear mixed models and generalized linear mixed models. Existing mixed model approaches to correct for population substructure have been previously investigated with both continuous and case-control response variables. However, they have not been investigated in the context of extreme phenotype sampling (EPS), where genetic covariates are only collected on samples having extreme response variable values. In this work, we compare the performance of existing binary trait mixed model approaches (GMMAT, LEAP and CARAT) on EPS data. Since linear mixed models are commonly used even with binary traits, we also evaluate the performance of a popular linear mixed model implementation (GEMMA).

RESULTS

We used simulation studies to estimate the type I error rate and power of all approaches assuming a population with substructure. Our simulation results show that for a common candidate variant, both LEAP and GMMAT control the type I error rate while CARAT's rate remains inflated. We applied all methods to a real dataset from a Québec, Canada, case-control study that is known to have population substructure. We observe similar type I error control with the analysis on the Québec dataset. For rare variants, the false positive rate remains inflated even after correction with mixed model approaches. For methods that control the type I error rate, the estimated power is comparable.

CONCLUSIONS

The methods compared in this study differ in their type I error control. Therefore, when data are from an EPS study, care should be taken to ensure that the models underlying the methodology are suitable to the sampling strategy and to the minor allele frequency of the candidate SNPs.

摘要

背景

混合模型用于校正全基因组关联研究中由于群体分层和隐藏相关性引起的混杂。这一类模型包括线性混合模型和广义线性混合模型。现有的混合模型方法已经被用于校正连续和病例对照响应变量的群体亚结构。然而,它们在极端表型抽样 (EPS) 中尚未得到研究,在这种情况下,仅对具有极端响应变量值的样本收集遗传协变量。在这项工作中,我们比较了现有的二元性状混合模型方法(GMMAT、LEAP 和 CARAT)在 EPS 数据上的性能。由于即使对于二元性状,线性混合模型也通常被使用,我们还评估了一种流行的线性混合模型实现(GEMMA)的性能。

结果

我们使用模拟研究来估计所有方法的假设具有亚结构群体的Ⅰ型错误率和功效。我们的模拟结果表明,对于常见的候选变异,LEAP 和 GMMAT 都控制了Ⅰ型错误率,而 CARAT 的错误率仍然偏高。我们将所有方法应用于一个已知具有群体亚结构的加拿大魁北克病例对照研究的真实数据集。我们在魁北克数据集上的分析观察到类似的Ⅰ型错误控制。对于罕见变异,即使使用混合模型方法校正,假阳性率仍然偏高。对于控制Ⅰ型错误率的方法,估计的功效是可比的。

结论

本研究中比较的方法在Ⅰ型错误控制方面存在差异。因此,当数据来自 EPS 研究时,应注意确保基础方法的模型适合采样策略和候选 SNPs 的次要等位基因频率。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c3fb/8815214/b9adc449f765/12864_2022_8297_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验