Suppr超能文献

基因关联荟萃分析容易受到研究间隐秘相关性造成的混杂影响。

Genetic association meta-analysis is susceptible to confounding by between-study cryptic relatedness.

作者信息

Tu Tiffany, Ochoa Alejandro

机构信息

Program of Computational Biology and Bioinformatics, Duke University, Durham, NC.

Department of Biostatistics and Bioinformatics, Duke University, Durham, NC.

出版信息

bioRxiv. 2025 May 12:2025.05.10.653279. doi: 10.1101/2025.05.10.653279.

Abstract

Meta-analysis of Genome-Wide Association Studies (GWAS) has important advantages, but it assumes that studies are independent, which does not hold when there is relatedness between studies. As a motivating example, recent work suggested applying sex-stratified meta-analysis to correct for participation bias, without considering that men and women from the same population will be highly related. Our theory demonstrates how cryptic relatedness results in correlated test statistics between studies, inflating meta-analysis. We characterize the effects of different between-study relatedness scenarios, particularly population structure and recent family relatedness, on meta-analysis type I error control and power. We simulated data with (1) no family relatedness between subpopulations, (2) family relatedness within subpopulations, (3) family relatedness across subpopulations, and (4) single population with family relatedness. We run joint and meta-analyses on simulations using both binary and quantitative traits. In scenarios with family relatedness, sex-stratified meta-analysis exhibits severe inflation and lower AUC compared to joint and subpopulation meta-analyses. Remarkably, genomic control succeeds in correcting inflation in these cases, but does not alter calibrated power. Analysis of real datasets confirms severe inflation for sex-stratified meta-analysis in family studies, but a negligible effect for population studies with up to 10,000 individuals. Our theoretical framework demonstrates that the inflation factor increases as the sample size increases. We recommend against meta-analyzing studies that share the same populations, which increases the risk of inflation due to cryptic relatedness between studies.

摘要

全基因组关联研究(GWAS)的荟萃分析具有重要优势,但它假定各研究是独立的,而当研究之间存在相关性时这一假定并不成立。作为一个具有启发性的例子,近期的研究表明应用按性别分层的荟萃分析来校正参与偏倚,但未考虑来自同一人群的男性和女性会高度相关。我们的理论证明了隐性相关性如何导致研究之间的检验统计量相关,从而使荟萃分析结果膨胀。我们描述了不同的研究间相关性情形,特别是群体结构和近期家族相关性,对荟萃分析I型错误控制和效能的影响。我们模拟了以下几种数据情况:(1)亚群体之间无家族相关性;(2)亚群体内部存在家族相关性;(3)亚群体之间存在家族相关性;(4)单个群体存在家族相关性。我们对使用二元性状和定量性状的模拟数据进行了联合分析和荟萃分析。在存在家族相关性的情形下,与联合分析和亚群体荟萃分析相比,按性别分层的荟萃分析表现出严重的结果膨胀和更低的曲线下面积(AUC)。值得注意的是,基因组控制在这些情况下成功校正了结果膨胀,但未改变校准后的效能。对真实数据集的分析证实,在家族研究中按性别分层的荟萃分析存在严重的结果膨胀,但对于个体数量多达10,000的群体研究,其影响可忽略不计。我们的理论框架表明,膨胀因子会随着样本量的增加而增大。我们建议不要对来自同一人群的研究进行荟萃分析,因为这会增加由于研究之间的隐性相关性而导致结果膨胀的风险。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/459b/12132175/96df150a883e/nihpp-2025.05.10.653279v1-f0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验