• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

全基因组关联分析中的筛选与变量选择的实际问题

Practical issues in screening and variable selection in genome-wide association analysis.

作者信息

Hong Sungyeon, Kim Yongkang, Park Taesung

机构信息

Department of Statistics, Seoul National University, Seoul, South Korea.

Department of Statistics, Seoul National University, Seoul, South Korea. ; Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, South Korea.

出版信息

Cancer Inform. 2015 Jan 14;13(Suppl 7):55-65. doi: 10.4137/CIN.S16350. eCollection 2014.

DOI:10.4137/CIN.S16350
PMID:25635166
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4298256/
Abstract

Variable selection methods play an important role in high-dimensional statistical modeling and analysis. Computational cost and estimation accuracy are the two main concerns for statistical inference from ultrahigh-dimensional data. In particular, genome-wide association studies (GWAS), which focus on identifying single nucleotide polymorphisms (SNPs) associated with a disease of interest, have produced ultrahigh-dimensional data. Numerous methods have been proposed to handle GWAS data. Most statistical methods have adopted a two-stage approach: pre-screening for dimensional reduction and variable selection to identify causal SNPs. The pre-screening step selects SNPs in terms of their P-values or the absolute values of the regression coefficients in single SNP analysis. Penalized regressions, such as the ridge, lasso, adaptive lasso, and elastic-net regressions, are commonly used for the variable selection step. In this paper, we investigate which combination of pre-screening method and penalized regression performs best on a quantitative phenotype using two real GWAS datasets.

摘要

变量选择方法在高维统计建模与分析中发挥着重要作用。计算成本和估计精度是超高维数据统计推断的两个主要关注点。特别是全基因组关联研究(GWAS),其专注于识别与感兴趣疾病相关的单核苷酸多态性(SNP),已经产生了超高维数据。已经提出了许多方法来处理GWAS数据。大多数统计方法都采用了两阶段方法:进行降维预筛选和变量选择以识别因果SNP。预筛选步骤根据单SNP分析中的P值或回归系数的绝对值来选择SNP。惩罚回归,如岭回归、套索回归、自适应套索回归和弹性网回归,通常用于变量选择步骤。在本文中,我们使用两个真实的GWAS数据集研究预筛选方法和惩罚回归的哪种组合在定量表型上表现最佳。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c63d/4298256/b33f4c9906d1/cin-suppl.7-2014-055f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c63d/4298256/24a2f5c2e3c7/cin-suppl.7-2014-055f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c63d/4298256/7e3d664f561b/cin-suppl.7-2014-055f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c63d/4298256/452c24940f45/cin-suppl.7-2014-055f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c63d/4298256/56565cec4793/cin-suppl.7-2014-055f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c63d/4298256/6edc2971c65e/cin-suppl.7-2014-055f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c63d/4298256/b33f4c9906d1/cin-suppl.7-2014-055f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c63d/4298256/24a2f5c2e3c7/cin-suppl.7-2014-055f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c63d/4298256/7e3d664f561b/cin-suppl.7-2014-055f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c63d/4298256/452c24940f45/cin-suppl.7-2014-055f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c63d/4298256/56565cec4793/cin-suppl.7-2014-055f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c63d/4298256/6edc2971c65e/cin-suppl.7-2014-055f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c63d/4298256/b33f4c9906d1/cin-suppl.7-2014-055f6.jpg

相似文献

1
Practical issues in screening and variable selection in genome-wide association analysis.全基因组关联分析中的筛选与变量选择的实际问题
Cancer Inform. 2015 Jan 14;13(Suppl 7):55-65. doi: 10.4137/CIN.S16350. eCollection 2014.
2
Penalized Regression and Risk Prediction in Genome-Wide Association Studies.全基因组关联研究中的惩罚回归与风险预测
Stat Anal Data Min. 2013 Aug 1;6(4). doi: 10.1002/sam.11183.
3
Evaluation of the lasso and the elastic net in genome-wide association studies.全基因组关联研究中lasso 和弹性网络的评估。
Front Genet. 2013 Dec 4;4:270. doi: 10.3389/fgene.2013.00270. eCollection 2013.
4
Prediction of Quantitative Traits Using Common Genetic Variants: Application to Body Mass Index.利用常见基因变异预测数量性状:在体重指数中的应用
Genomics Inform. 2016 Dec;14(4):149-159. doi: 10.5808/GI.2016.14.4.149. Epub 2016 Dec 30.
5
Exploiting Linkage Disequilibrium for Ultrahigh-Dimensional Genome-Wide Data with an Integrated Statistical Approach.利用连锁不平衡和综合统计方法处理超高维全基因组数据
Genetics. 2016 Feb;202(2):411-26. doi: 10.1534/genetics.115.179507. Epub 2015 Dec 12.
6
Performance of a blockwise approach in variable selection using linkage disequilibrium information.使用连锁不平衡信息进行变量选择时的分块方法性能。
BMC Bioinformatics. 2015 May 8;16:148. doi: 10.1186/s12859-015-0556-6.
7
A FAST ALGORITHM FOR DETECTING GENE-GENE INTERACTIONS IN GENOME-WIDE ASSOCIATION STUDIES.一种在全基因组关联研究中检测基因-基因相互作用的快速算法。
Ann Appl Stat. 2014;8(4):2292-2318. doi: 10.1214/14-aoas771.
8
High-dimensional Cox models: the choice of penalty as part of the model building process.高维Cox模型:作为模型构建过程一部分的惩罚项选择
Biom J. 2010 Feb;52(1):50-69. doi: 10.1002/bimj.200900064.
9
Iterative hard thresholding for model selection in genome-wide association studies.全基因组关联研究中用于模型选择的迭代硬阈值法
Genet Epidemiol. 2017 Dec;41(8):756-768. doi: 10.1002/gepi.22068. Epub 2017 Sep 6.
10
Lost in Translation: On the Problem of Data Coding in Penalized Whole Genome Regression with Interactions.翻译中的迷失:关于带交互项的惩罚全基因组回归中的数据编码问题
G3 (Bethesda). 2019 Apr 9;9(4):1117-1129. doi: 10.1534/g3.118.200961.

引用本文的文献

1
Identifying and overcoming COVID-19 vaccination impediments using Bayesian data mining techniques.利用贝叶斯数据分析技术识别和克服 COVID-19 疫苗接种障碍。
Sci Rep. 2024 Apr 13;14(1):8595. doi: 10.1038/s41598-024-58902-1.
2
SNP variable selection by generalized graph domination.基于广义图控制的 SNP 变量选择。
PLoS One. 2019 Jan 24;14(1):e0203242. doi: 10.1371/journal.pone.0203242. eCollection 2019.
3
Cytogenetically visible copy number variations (CG-CNVs) in banding and molecular cytogenetics of human; about heteromorphisms and euchromatic variants.

本文引用的文献

1
Genome-wide meta-analysis identifies 11 new loci for anthropometric traits and provides insights into genetic architecture.全基因组荟萃分析确定了 11 个人体测量性状的新位点,并提供了对遗传结构的深入了解。
Nat Genet. 2013 May;45(5):501-12. doi: 10.1038/ng.2606. Epub 2013 Apr 7.
2
Joint identification of multiple genetic variants via elastic-net variable selection in a genome-wide association analysis.在全基因组关联分析中通过弹性网络变量选择进行多个遗传变异的联合识别。
Ann Hum Genet. 2010 Sep 1;74(5):416-28. doi: 10.1111/j.1469-1809.2010.00597.x. Epub 2010 Jul 14.
3
Genomewide multiple-loci mapping in experimental crosses by iterative adaptive penalized regression.
人类染色体显带和分子细胞遗传学中细胞遗传学可见的拷贝数变异(CG-CNVs);关于异态性和常染色质变体。
Mol Cytogenet. 2016 Jan 22;9:5. doi: 10.1186/s13039-016-0216-1. eCollection 2016.
基于迭代自适应惩罚回归的实验杂交中全基因组多位点映射。
Genetics. 2010 May;185(1):349-59. doi: 10.1534/genetics.110.114280. Epub 2010 Feb 15.
4
Discussion of "Sure Independence Screening for Ultra-High Dimensional Feature Space.《超高维特征空间中的确定独立性筛选》讨论
J R Stat Soc Series B Stat Methodol. 2008 Nov;70(5):903. doi: 10.1111/j.1467-9868.2008.00674.x.
5
A large-scale genome-wide association study of Asian populations uncovers genetic factors influencing eight quantitative traits.一项针对亚洲人群的大规模全基因组关联研究揭示了影响八个数量性状的遗传因素。
Nat Genet. 2009 May;41(5):527-34. doi: 10.1038/ng.357. Epub 2009 Apr 26.
6
Genome-wide association analysis by lasso penalized logistic regression.基于套索惩罚逻辑回归的全基因组关联分析。
Bioinformatics. 2009 Mar 15;25(6):714-21. doi: 10.1093/bioinformatics/btp041. Epub 2009 Jan 28.
7
Many sequence variants affecting diversity of adult human height.许多序列变异影响成年人类身高的多样性。
Nat Genet. 2008 May;40(5):609-15. doi: 10.1038/ng.122. Epub 2008 Apr 6.
8
The Age-Related Eye Disease Study (AREDS): design implications. AREDS report no. 1.年龄相关性眼病研究(AREDS):设计启示。AREDS报告第1号。
Control Clin Trials. 1999 Dec;20(6):573-600. doi: 10.1016/s0197-2456(99)00031-8.