筛选与清理：一种用于鉴定全基因组关联研究中相互作用的工具。

Screen and clean: a tool for identifying interactions in genome-wide association studies.

机构信息

Department of Statistics, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA.

出版信息

Genet Epidemiol. 2010 Apr;34(3):275-85. doi: 10.1002/gepi.20459.

DOI:10.1002/gepi.20459

PMID:20088021

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2915560/

Abstract

Epistasis could be an important source of risk for disease. How interacting loci might be discovered is an open question for genome-wide association studies (GWAS). Most researchers limit their statistical analyses to testing individual pairwise interactions (i.e., marginal tests for association). A more effective means of identifying important predictors is to fit models that include many predictors simultaneously (i.e., higher-dimensional models). We explore a procedure called screen and clean (SC) for identifying liability loci, including interactions, by using the lasso procedure, which is a model selection tool for high-dimensional regression. We approach the problem by using a varying dictionary consisting of terms to include in the model. In the first step the lasso dictionary includes only main effects. The most promising single-nucleotide polymorphisms (SNPs) are identified using a screening procedure. Next the lasso dictionary is adjusted to include these main effects and the corresponding interaction terms. Again, promising terms are identified using lasso screening. Then significant terms are identified through the cleaning process. Implementation of SC for GWAS requires algorithms to explore the complex model space induced by the many SNPs genotyped and their interactions. We propose and explore a set of algorithms and find that SC successfully controls Type I error while yielding good power to identify risk loci and their interactions. When the method is applied to data obtained from the Wellcome Trust Case Control Consortium study of Type 1 Diabetes it uncovers evidence supporting interaction within the HLA class II region as well as within Chromosome 12q24.

摘要

上位性可能是疾病风险的一个重要来源。如何发现相互作用的基因座是全基因组关联研究（GWAS）的一个开放性问题。大多数研究人员将他们的统计分析限制在测试个体的两两相互作用（即关联的边际检验）上。识别重要预测因子的更有效方法是拟合同时包含多个预测因子的模型（即高维模型）。我们探索了一种称为筛选和清理（SC）的程序，通过使用套索程序来识别易感性基因座，包括相互作用，套索程序是一种高维回归的模型选择工具。我们通过使用包含在模型中的术语的变化字典来解决这个问题。在第一步中，套索字典仅包含主效应。使用筛选程序识别最有前途的单核苷酸多态性（SNP）。接下来，调整套索字典以包含这些主效应和相应的相互作用项。再次使用套索筛选来识别有前途的术语。然后通过清理过程识别显著术语。SC 用于 GWAS 的实施需要算法来探索由许多基因分型的 SNP 及其相互作用引起的复杂模型空间。我们提出并探索了一组算法，发现 SC 成功地控制了 I 型错误，同时具有识别风险基因座及其相互作用的良好功效。当该方法应用于从 Wellcome Trust 病例对照联盟研究 1 型糖尿病获得的数据时，它揭示了支持 HLA Ⅱ类区域内以及 12q24 染色体内相互作用的证据。

相似文献

Screen and clean: a tool for identifying interactions in genome-wide association studies.

Genet Epidemiol. 2010 Apr;34(3):275-85. doi: 10.1002/gepi.20459.

Prioritizing tests of epistasis through hierarchical representation of genomic redundancies.

Nucleic Acids Res. 2017 Aug 21;45(14):e131. doi: 10.1093/nar/gkx505.

BOOST: A fast approach to detecting gene-gene interactions in genome-wide case-control studies.

Am J Hum Genet. 2010 Sep 10;87(3):325-40. doi: 10.1016/j.ajhg.2010.07.021.

Increasing power of genome-wide association studies by collecting additional single-nucleotide polymorphisms.

Genetics. 2011 Jun;188(2):449-60. doi: 10.1534/genetics.111.128595. Epub 2011 Apr 5.

A whole-genome simulator capable of modeling high-order epistasis for complex disease.

Genet Epidemiol. 2013 Nov;37(7):686-94. doi: 10.1002/gepi.21761. Epub 2013 Oct 1.

Performance of epistasis detection methods in semi-simulated GWAS.

BMC Bioinformatics. 2018 Jun 18;19(1):231. doi: 10.1186/s12859-018-2229-8.

Detecting purely epistatic multi-locus interactions by an omnibus permutation test on ensembles of two-locus analyses.

BMC Bioinformatics. 2009 Sep 17;10:294. doi: 10.1186/1471-2105-10-294.

Mining gold dust under the genome wide significance level: a two-stage approach to analysis of GWAS.

Genet Epidemiol. 2011 Feb;35(2):111-8. doi: 10.1002/gepi.20556. Epub 2010 Dec 31.

iLOCi: a SNP interaction prioritization technique for detecting epistasis in genome-wide association studies.

BMC Genomics. 2012;13 Suppl 7(Suppl 7):S2. doi: 10.1186/1471-2164-13-S7-S2. Epub 2012 Dec 13.

Gene, pathway and network frameworks to identify epistatic interactions of single nucleotide polymorphisms derived from GWAS data.

BMC Syst Biol. 2012;6 Suppl 3(Suppl 3):S15. doi: 10.1186/1752-0509-6-S3-S15. Epub 2012 Dec 17.

引用本文的文献

Frontal white and gray matter abnormality in gambling disorder: A multimodal MRI study.

J Behav Addict. 2024 Jun 26;13(2):576-586. doi: 10.1556/2006.2024.00031.

Potential application of elastic nets for shared polygenicity detection with adapted threshold selection.

Int J Biostat. 2022 Nov 3;19(2):417-438. doi: 10.1515/ijb-2020-0108. eCollection 2023 Nov 1.

A hierarchical integrative group least absolute shrinkage and selection operator for analyzing environmental mixtures.

Environmetrics. 2021 Dec;32(8). doi: 10.1002/env.2698. Epub 2021 Jul 30.

False discovery rate control in genome-wide association studies with population structure.

Proc Natl Acad Sci U S A. 2021 Oct 5;118(40). doi: 10.1073/pnas.2105841118.

MIDESP: Mutual Information-Based Detection of Epistatic SNP Pairs for Qualitative and Quantitative Phenotypes.

Biology (Basel). 2021 Sep 16;10(9):921. doi: 10.3390/biology10090921.

Variable selection methods for identifying predictor interactions in data with repeatedly measured binary outcomes.

J Clin Transl Sci. 2020 Nov 16;5(1):e59. doi: 10.1017/cts.2020.556.

Genomics models in radiotherapy: From mechanistic to machine learning.

Med Phys. 2020 Jun;47(5):e203-e217. doi: 10.1002/mp.13751.

Multi-resolution localization of causal variants across the genome.

Nat Commun. 2020 Feb 27;11(1):1093. doi: 10.1038/s41467-020-14791-2.

Efficient Signal Inclusion With Genomic Applications.

J Am Stat Assoc. 2019;114(528):1787-1799. doi: 10.1080/01621459.2018.1518236. Epub 2019 Feb 27.

Analysis of genotype by methylation interactions through sparsity-inducing regularized regression.

BMC Proc. 2018 Sep 17;12(Suppl 9):40. doi: 10.1186/s12919-018-0145-6. eCollection 2018.

本文引用的文献

HIGH DIMENSIONAL VARIABLE SELECTION.

Ann Stat. 2009 Jan 1;37(5A):2178-2201. doi: 10.1214/08-aos646.

Thymus-specific deletion of insulin induces autoimmune diabetes.

EMBO J. 2009 Sep 16;28(18):2812-24. doi: 10.1038/emboj.2009.212. Epub 2009 Aug 13.

Discovering genetic ancestry using spectral graph theory.

Genet Epidemiol. 2010 Jan;34(1):51-9. doi: 10.1002/gepi.20434.

Detecting gene-gene interactions that underlie human diseases.

Nat Rev Genet. 2009 Jun;10(6):392-404. doi: 10.1038/nrg2579.

Genome-wide association study and meta-analysis find that over 40 loci affect risk of type 1 diabetes.

Nat Genet. 2009 Jun;41(6):703-7. doi: 10.1038/ng.381. Epub 2009 May 10.

Using biological networks to search for interacting loci in genome-wide association studies.

Eur J Hum Genet. 2009 Oct;17(10):1231-40. doi: 10.1038/ejhg.2009.15. Epub 2009 Mar 11.

Genome-wide association analysis by lasso penalized logistic regression.

Bioinformatics. 2009 Mar 15;25(6):714-21. doi: 10.1093/bioinformatics/btp041. Epub 2009 Jan 28.

Confirmation of HLA class II independent type 1 diabetes associations in the major histocompatibility complex including HLA-B and HLA-A.

Diabetes Obes Metab. 2009 Feb;11 Suppl 1(Suppl 1):31-45. doi: 10.1111/j.1463-1326.2008.01001.x.

Epistasis--the essential role of gene interactions in the structure and evolution of genetic systems.

Nat Rev Genet. 2008 Nov;9(11):855-67. doi: 10.1038/nrg2452.

Simultaneous analysis of all SNPs in genome-wide and re-sequencing association studies.

PLoS Genet. 2008 Jul 25;4(7):e1000130. doi: 10.1371/journal.pgen.1000130.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

筛选与清理：一种用于鉴定全基因组关联研究中相互作用的工具。

Screen and clean: a tool for identifying interactions in genome-wide association studies.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献