利用无效工具变量和全基因组关联研究（GWAS）汇总数据进行转录组全关联研究中的因果推断

Causal Inference in Transcriptome-Wide Association Studies with Invalid Instruments and GWAS Summary Data.

作者信息

Xue Haoran, Shen Xiaotong, Pan Wei

机构信息

School of Statistics, University of Minnesota, Minneapolis, Minnesota 55455.

Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota 55455.

出版信息

J Am Stat Assoc. 2023;118(543):1525-1537. doi: 10.1080/01621459.2023.2183127. Epub 2023 Mar 17.

DOI:10.1080/01621459.2023.2183127

PMID:37808547

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10557939/

Abstract

Transcriptome-wide association studies (TWAS) have recently emerged as a popular tool to discover (putative) causal genes by integrating an outcome GWAS dataset with another gene expression/transcriptome GWAS (called eQTL) dataset. In our motivating and target application, we'd like to identify causal genes for low-density lipoprotein cholesterol (LDL), which is crucial for developing new treatments for hyperlipidemia and cardiovascular diseases. The statistical principle underlying TWAS is (two-sample) two-stage least squares (2SLS) using multiple correlated SNPs as instrumental variables (IVs); it is closely related to typical (two-sample) Mendelian randomization (MR) using independent SNPs as IVs, which is expected to be impractical and lower-powered for TWAS (and some other) applications. However, often some of the SNPs used may not be valid IVs, e.g. due to the widespread pleiotropy of their direct effects on the outcome not mediated through the gene of interest, leading to false conclusions by TWAS (or MR). Building on recent advances in sparse regression, we propose a robust and efficient inferential method to account for both hidden confounding and some invalid IVs via two-stage constrained maximum likelihood (2ScML), an extension of 2SLS. We first develop the proposed method with individual-level data, then extend it both theoretically and computationally to GWAS summary data for the most popular two-sample TWAS design, to which almost all existing robust IV regression methods are however not applicable. We show that the proposed method achieves asymptotically valid statistical inference on causal effects, demonstrating its wider applicability and superior finite-sample performance over the standard 2SLS/TWAS (and MR). We apply the methods to identify putative causal genes for LDL by integrating large-scale lipid GWAS summary data with eQTL data.

摘要

全转录组关联研究（TWAS）最近已成为一种流行的工具，通过将结果全基因组关联研究（GWAS）数据集与另一个基因表达/转录组GWAS（称为表达定量性状位点，eQTL）数据集相结合来发现（假定的）因果基因。在我们的激励性和目标应用中，我们希望识别与低密度脂蛋白胆固醇（LDL）相关的因果基因，这对于开发高脂血症和心血管疾病的新治疗方法至关重要。TWAS背后的统计原理是使用多个相关单核苷酸多态性（SNP）作为工具变量（IV）的（两样本）两阶段最小二乘法（2SLS）；它与使用独立SNP作为IV的典型（两样本）孟德尔随机化（MR）密切相关，预计对于TWAS（以及其他一些）应用而言，这种方法不切实际且功效较低。然而，通常所使用的一些SNP可能不是有效的IV，例如，由于它们对结果的直接影响广泛存在多效性，并非通过感兴趣的基因介导，这会导致TWAS（或MR）得出错误结论。基于稀疏回归的最新进展，我们提出了一种稳健且有效的推断方法，通过两阶段约束最大似然法（2ScML）来解决隐藏的混杂因素和一些无效IV的问题，2ScML是2SLS的扩展。我们首先使用个体水平数据开发所提出的方法，然后在理论和计算上对其进行扩展，以适用于最流行的两样本TWAS设计的GWAS汇总数据，然而几乎所有现有的稳健IV回归方法都不适用于此。我们表明，所提出的方法在因果效应方面实现了渐近有效的统计推断，证明了其比标准2SLS/TWAS（和MR）具有更广泛的适用性和优越的有限样本性能。我们应用这些方法，通过整合大规模脂质GWAS汇总数据和eQTL数据来识别LDL的假定因果基因。

相似文献

Causal Inference in Transcriptome-Wide Association Studies with Invalid Instruments and GWAS Summary Data.

J Am Stat Assoc. 2023;118(543):1525-1537. doi: 10.1080/01621459.2023.2183127. Epub 2023 Mar 17.

Some statistical consideration in transcriptome-wide association studies.

Genet Epidemiol. 2020 Apr;44(3):221-232. doi: 10.1002/gepi.22274. Epub 2019 Dec 10.

Model checking via testing for direct effects in Mendelian Randomization and transcriptome-wide association studies.

PLoS Comput Biol. 2021 Aug 2;17(8):e1009266. doi: 10.1371/journal.pcbi.1009266. eCollection 2021 Aug.

A robust two-sample transcriptome-wide Mendelian randomization method integrating GWAS with multi-tissue eQTL summary statistics.

Genet Epidemiol. 2021 Jun;45(4):353-371. doi: 10.1002/gepi.22380. Epub 2021 Apr 9.

Statistical power of transcriptome-wide association studies.

Genet Epidemiol. 2022 Dec;46(8):572-588. doi: 10.1002/gepi.22491. Epub 2022 Jun 29.

Inferring causal direction between two traits using R with application to transcriptome-wide association studies.

Am J Hum Genet. 2024 Aug 8;111(8):1782-1795. doi: 10.1016/j.ajhg.2024.06.013. Epub 2024 Jul 24.

DeLIVR: a deep learning approach to IV regression for testing nonlinear causal effects in transcriptome-wide association studies.

Biostatistics. 2024 Apr 15;25(2):468-485. doi: 10.1093/biostatistics/kxac051.

Inference of causal metabolite networks in the presence of invalid instrumental variables with GWAS summary data.

Genet Epidemiol. 2023 Dec;47(8):585-599. doi: 10.1002/gepi.22535. Epub 2023 Aug 13.

Inferring causal direction between two traits in the presence of horizontal pleiotropy with GWAS summary data.

PLoS Genet. 2020 Nov 2;16(11):e1009105. doi: 10.1371/journal.pgen.1009105. eCollection 2020 Nov.

Constrained maximum likelihood-based Mendelian randomization robust to both correlated and uncorrelated pleiotropic effects.

Am J Hum Genet. 2021 Jul 1;108(7):1251-1269. doi: 10.1016/j.ajhg.2021.05.014.

引用本文的文献

A Genetics-guided Integrative Framework for Drug Repurposing: Identifying Anti-hypertensive Drug Telmisartan for Type 2 Diabetes.

medRxiv. 2025 Mar 23:2025.03.22.25324223. doi: 10.1101/2025.03.22.25324223.

Multivariate proteome-wide association study to identify causal proteins for Alzheimer disease.

Am J Hum Genet. 2025 Feb 6;112(2):291-300. doi: 10.1016/j.ajhg.2024.12.010. Epub 2025 Jan 9.

Co-expression-wide association studies link genetically regulated interactions with complex traits.

medRxiv. 2024 Dec 13:2024.10.02.24314813. doi: 10.1101/2024.10.02.24314813.

Identification of proteins associated with type 2 diabetes risk in diverse racial and ethnic populations.

Diabetologia. 2024 Dec;67(12):2754-2770. doi: 10.1007/s00125-024-06277-3. Epub 2024 Sep 30.

The goldmine of GWAS summary statistics: a systematic review of methods and tools.

BioData Min. 2024 Sep 5;17(1):31. doi: 10.1186/s13040-024-00385-x.

Inferring causal direction between two traits using R with application to transcriptome-wide association studies.

Am J Hum Genet. 2024 Aug 8;111(8):1782-1795. doi: 10.1016/j.ajhg.2024.06.013. Epub 2024 Jul 24.

A robust cis-Mendelian randomization method with application to drug target discovery.

Nat Commun. 2024 Jul 18;15(1):6072. doi: 10.1038/s41467-024-50385-y.

MIMOSA: a resource consisting of improved methylome prediction models increases power to identify DNA methylation-phenotype associations.

Epigenetics. 2024 Dec;19(1):2370542. doi: 10.1080/15592294.2024.2370542. Epub 2024 Jul 4.

Splicing-specific transcriptome-wide association uncovers genetic mechanisms for schizophrenia.

Am J Hum Genet. 2024 Aug 8;111(8):1573-1587. doi: 10.1016/j.ajhg.2024.06.001. Epub 2024 Jun 25.

Causal relationship between circulating cytokines and follicular lymphoma: a two-sample Mendelian randomization study.

Am J Cancer Res. 2024 Apr 15;14(4):1577-1593. doi: 10.62347/JCKD6973. eCollection 2024.

本文引用的文献

Constrained maximum likelihood-based Mendelian randomization robust to both correlated and uncorrelated pleiotropic effects.

Am J Hum Genet. 2021 Jul 1;108(7):1251-1269. doi: 10.1016/j.ajhg.2021.05.014.

Weak-instrument robust tests in two-sample summary-data Mendelian randomization.

Biometrics. 2022 Dec;78(4):1699-1713. doi: 10.1111/biom.13524. Epub 2021 Aug 7.

KEGG: integrating viruses and cellular organisms.

Nucleic Acids Res. 2021 Jan 8;49(D1):D545-D551. doi: 10.1093/nar/gkaa970.

A unified framework for joint-tissue transcriptome-wide association and Mendelian randomization analysis.

Nat Genet. 2020 Nov;52(11):1239-1246. doi: 10.1038/s41588-020-0706-2. Epub 2020 Oct 5.

A comparison of robust Mendelian randomization methods using summary data.

Genet Epidemiol. 2020 Jun;44(4):313-329. doi: 10.1002/gepi.22295. Epub 2020 Apr 6.

IGREX for quantifying the impact of genetically regulated expression on phenotypes.

NAR Genom Bioinform. 2020 Mar;2(1):lqaa010. doi: 10.1093/nargab/lqaa010. Epub 2020 Feb 19.

A robust and efficient method for Mendelian randomization with hundreds of genetic variants.

Nat Commun. 2020 Jan 17;11(1):376. doi: 10.1038/s41467-019-14156-4.

A powerful fine-mapping method for transcriptome-wide association studies.

Hum Genet. 2020 Feb;139(2):199-213. doi: 10.1007/s00439-019-02098-2. Epub 2019 Dec 16.

On the Use of the Lasso for Instrumental Variables Estimation with Some Invalid Instruments.

J Am Stat Assoc. 2018 Nov 13;114(527):1339-1350. doi: 10.1080/01621459.2018.1498346. eCollection 2019.

Mendelian randomization analysis using mixture models for robust and efficient estimation of causal effects.

Nat Commun. 2019 Apr 26;10(1):1941. doi: 10.1038/s41467-019-09432-2.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

利用无效工具变量和全基因组关联研究（GWAS）汇总数据进行转录组全关联研究中的因果推断

Causal Inference in Transcriptome-Wide Association Studies with Invalid Instruments and GWAS Summary Data.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献