校正基因表达遗传分析中的隐藏混杂因素。

Correction for hidden confounders in the genetic analysis of gene expression.

机构信息

Microsoft Research, 1100 Glendon Avenue, Suite PH1, Los Angeles, CA, USA.

出版信息

Proc Natl Acad Sci U S A. 2010 Sep 21;107(38):16465-70. doi: 10.1073/pnas.1002425107. Epub 2010 Sep 1.

DOI:10.1073/pnas.1002425107

PMID:20810919

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2944732/

Abstract

Understanding the genetic underpinnings of disease is important for screening, treatment, drug development, and basic biological insight. One way of getting at such an understanding is to find out which parts of our DNA, such as single-nucleotide polymorphisms, affect particular intermediary processes such as gene expression. Naively, such associations can be identified using a simple statistical test on all paired combinations of genetic variants and gene transcripts. However, a wide variety of confounders lie hidden in the data, leading to both spurious associations and missed associations if not properly addressed. We present a statistical model that jointly corrects for two particular kinds of hidden structure--population structure (e.g., race, family-relatedness), and microarray expression artifacts (e.g., batch effects), when these confounders are unknown. Applying our method to both real and synthetic, human and mouse data, we demonstrate the need for such a joint correction of confounders, and also the disadvantages of other possible approaches based on those in the current literature. In particular, we show that our class of models has maximum power to detect eQTL on synthetic data, and has the best performance on a bronze standard applied to real data. Lastly, our software and the associations we found with it are available at http://www.microsoft.com/science.

摘要

了解疾病的遗传基础对于筛查、治疗、药物开发和基础生物学研究都很重要。了解这些遗传基础的一种方法是找出我们的 DNA 中的哪些部分（如单核苷酸多态性）会影响特定的中间过程，如基因表达。从表面上看，可以通过对遗传变异和基因转录本的所有配对组合进行简单的统计测试来发现这些关联。然而，如果不加以适当处理，隐藏在数据中的各种混杂因素会导致虚假关联和遗漏关联。我们提出了一种统计模型，当混杂因素未知时，该模型可以联合纠正两种特定的隐藏结构——群体结构（例如，种族、家族相关性）和微阵列表达伪影（例如，批次效应）。我们将该方法应用于真实和合成的人类和小鼠数据，证明了需要联合纠正混杂因素，并且还证明了基于当前文献中其他方法的缺点。特别是，我们表明，我们的模型类在合成数据上具有最大的 eQTL 检测能力，并且在应用于真实数据的青铜标准上具有最佳性能。最后，我们的软件和我们通过该软件发现的关联可在 http://www.microsoft.com/science 上获取。

相似文献

Correction for hidden confounders in the genetic analysis of gene expression.

Proc Natl Acad Sci U S A. 2010 Sep 21;107(38):16465-70. doi: 10.1073/pnas.1002425107. Epub 2010 Sep 1.

A Bayesian framework to account for complex non-genetic factors in gene expression levels greatly increases power in eQTL studies.

PLoS Comput Biol. 2010 May 6;6(5):e1000770. doi: 10.1371/journal.pcbi.1000770.

Conditional random fields for fast, large-scale genome-wide association studies.

PLoS One. 2011;6(7):e21591. doi: 10.1371/journal.pone.0021591. Epub 2011 Jul 12.

A statistical multiprobe model for analyzing cis and trans genes in genetical genomics experiments with short-oligonucleotide arrays.

Genetics. 2005 Nov;171(3):1437-9. doi: 10.1534/genetics.105.045930. Epub 2005 Aug 3.

JEPEG: a summary statistics based tool for gene-level joint testing of functional variants.

Bioinformatics. 2015 Apr 15;31(8):1176-82. doi: 10.1093/bioinformatics/btu816. Epub 2014 Dec 12.

How powerful are summary-based methods for identifying expression-trait associations under different genetic architectures?

Pac Symp Biocomput. 2018;23:228-239.

Further improvements to linear mixed models for genome-wide association studies.

Sci Rep. 2014 Nov 12;4:6874. doi: 10.1038/srep06874.

Evaluation of PrediXcan for prioritizing GWAS associations and predicting gene expression.

Pac Symp Biocomput. 2018;23:448-459.

Accurate discovery of expression quantitative trait loci under confounding from spurious and genuine regulatory hotspots.

Genetics. 2008 Dec;180(4):1909-25. doi: 10.1534/genetics.108.094201. Epub 2008 Sep 14.

Local Joint Testing Improves Power and Identifies Hidden Heritability in Association Studies.

Genetics. 2016 Jul;203(3):1105-16. doi: 10.1534/genetics.116.188292. Epub 2016 May 6.

引用本文的文献

An expression-directed linear mixed model discovering low-effect genetic variants.

Genetics. 2024 Apr 3;226(4). doi: 10.1093/genetics/iyae018.

UNIFYING AND GENERALIZING METHODS FOR REMOVING UNWANTED VARIATION BASED ON NEGATIVE CONTROLS.

Stat Sin. 2021 Jul;31(3):1145-1166. doi: 10.5705/ss.202018.0345.

Pan-Genomic Regulation of Gene Expression in Normal and Pathological Human Placentas.

Cells. 2023 Feb 10;12(4):578. doi: 10.3390/cells12040578.

Modeling mutational effects on biochemical phenotypes using convolutional neural networks: application to SARS-CoV-2.

iScience. 2022 Jul 15;25(7):104500. doi: 10.1016/j.isci.2022.104500. Epub 2022 Jun 2.

LORSEN: Fast and Efficient eQTL Mapping With Low Rank Penalized Regression.

Front Genet. 2021 Nov 17;12:690926. doi: 10.3389/fgene.2021.690926. eCollection 2021.

Restricted maximum-likelihood method for learning latent variance components in gene expression data with known and unknown confounders.

G3 (Bethesda). 2022 Feb 4;12(2). doi: 10.1093/g3journal/jkab410.

Navigating the pitfalls of applying machine learning in genomics.

Nat Rev Genet. 2022 Mar;23(3):169-181. doi: 10.1038/s41576-021-00434-9. Epub 2021 Nov 26.

A general approach to sensitivity analysis for Mendelian randomization.

Stat Biosci. 2021 Apr;13(1):34-55. doi: 10.1007/s12561-020-09280-5. Epub 2020 Apr 28.

Fully automated web-based tool for identifying regulatory hotspots.

BMC Genomics. 2020 Nov 18;21(Suppl 10):616. doi: 10.1186/s12864-020-07012-z.

DataRemix: a universal data transformation for optimal inference from gene expression datasets.

Bioinformatics. 2021 May 17;37(7):984-991. doi: 10.1093/bioinformatics/btaa745.

本文引用的文献

Assessing the prospects of genome-wide association studies performed in inbred mice.

Mamm Genome. 2010 Apr;21(3-4):143-52. doi: 10.1007/s00335-010-9249-7. Epub 2010 Feb 5.

Population genomics in a disease targeted primary cell model.

Genome Res. 2009 Nov;19(11):1942-52. doi: 10.1101/gr.095224.109. Epub 2009 Aug 4.

Genetics of human gene expression: mapping DNA variants that influence gene expression.

Nat Rev Genet. 2009 Sep;10(9):595-604. doi: 10.1038/nrg2630. Epub 2009 Jul 28.

Discovering genetic ancestry using spectral graph theory.

Genet Epidemiol. 2010 Jan;34(1):51-9. doi: 10.1002/gepi.20434.

Detection and interpretation of expression quantitative trait loci (eQTL).

Methods. 2009 Jul;48(3):265-76. doi: 10.1016/j.ymeth.2009.03.004. Epub 2009 Mar 18.

Mapping complex disease traits with global gene expression.

Nat Rev Genet. 2009 Mar;10(3):184-94. doi: 10.1038/nrg2537.

Learning a prior on regulatory potential from eQTL data.

PLoS Genet. 2009 Jan;5(1):e1000358. doi: 10.1371/journal.pgen.1000358. Epub 2009 Jan 30.

Phylogenetic dependency networks: inferring patterns of CTL escape and codon covariation in HIV-1 Gag.

PLoS Comput Biol. 2008 Nov;4(11):e1000225. doi: 10.1371/journal.pcbi.1000225. Epub 2008 Nov 21.

Using gene expression to investigate the genetic basis of complex disorders.

Hum Mol Genet. 2008 Oct 15;17(R2):R129-34. doi: 10.1093/hmg/ddn285.

Accurate discovery of expression quantitative trait loci under confounding from spurious and genuine regulatory hotspots.

Genetics. 2008 Dec;180(4):1909-25. doi: 10.1534/genetics.108.094201. Epub 2008 Sep 14.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

校正基因表达遗传分析中的隐藏混杂因素。

Correction for hidden confounders in the genetic analysis of gene expression.

机构信息

Microsoft Research, 1100 Glendon Avenue, Suite PH1, Los Angeles, CA, USA.

出版信息

Proc Natl Acad Sci U S A. 2010 Sep 21;107(38):16465-70. doi: 10.1073/pnas.1002425107. Epub 2010 Sep 1.

DOI:10.1073/pnas.1002425107

PMID:20810919

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2944732/

Abstract

摘要

校正基因表达遗传分析中的隐藏混杂因素。

Correction for hidden confounders in the genetic analysis of gene expression.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

校正基因表达遗传分析中的隐藏混杂因素。

Correction for hidden confounders in the genetic analysis of gene expression.

机构信息

出版信息