• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种全基因组关联研究的变量选择方法。

A variable selection method for genome-wide association studies.

机构信息

Department of Biostatistics, University of North Carolina, Chapel Hill, NC 27599, USA.

出版信息

Bioinformatics. 2011 Jan 1;27(1):1-8. doi: 10.1093/bioinformatics/btq600. Epub 2010 Oct 29.

DOI:10.1093/bioinformatics/btq600
PMID:21036813
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3025714/
Abstract

MOTIVATION

Genome-wide association studies (GWAS) involving half a million or more single nucleotide polymorphisms (SNPs) allow genetic dissection of complex diseases in a holistic manner. The common practice of analyzing one SNP at a time does not fully realize the potential of GWAS to identify multiple causal variants and to predict risk of disease. Existing methods for joint analysis of GWAS data tend to miss causal SNPs that are marginally uncorrelated with disease and have high false discovery rates (FDRs).

RESULTS

We introduce GWASelect, a statistically powerful and computationally efficient variable selection method designed to tackle the unique challenges of GWAS data. This method searches iteratively over the potential SNPs conditional on previously selected SNPs and is thus capable of capturing causal SNPs that are marginally correlated with disease as well as those that are marginally uncorrelated with disease. A special resampling mechanism is built into the method to reduce false positive findings. Simulation studies demonstrate that the GWASelect performs well under a wide spectrum of linkage disequilibrium patterns and can be substantially more powerful than existing methods in capturing causal variants while having a lower FDR. In addition, the regression models based on the GWASelect tend to yield more accurate prediction of disease risk than existing methods. The advantages of the GWASelect are illustrated with the Wellcome Trust Case-Control Consortium (WTCCC) data.

AVAILABILITY

The software implementing GWASelect is available at http://www.bios.unc.edu/~lin. Access to WTCCC data: http://www.wtccc.org.uk/.

摘要

动机

全基因组关联研究(GWAS)涉及五十万或更多的单核苷酸多态性(SNPs),可以全面地对复杂疾病进行基因剖析。一次分析一个 SNP 的常见做法并没有充分发挥 GWAS 识别多个因果变异和预测疾病风险的潜力。现有的联合分析 GWAS 数据的方法往往会错过与疾病呈轻微不相关且具有高假发现率(FDR)的因果 SNPs。

结果

我们引入了 GWASelect,这是一种统计强大且计算高效的变量选择方法,旨在解决 GWAS 数据的独特挑战。该方法在先前选择的 SNPs 的条件下迭代地搜索潜在的 SNPs,因此能够捕获与疾病呈轻微相关的因果 SNPs 以及与疾病呈轻微不相关的因果 SNPs。该方法内置了一种特殊的重采样机制,以减少假阳性发现。模拟研究表明,GWASelect 在广泛的连锁不平衡模式下表现良好,并且在捕获因果变异方面比现有的方法更强大,同时具有更低的 FDR。此外,基于 GWASelect 的回归模型往往比现有的方法更能准确预测疾病风险。GWASelect 的优势在惠康信托基金会病例对照联盟(WTCCC)的数据中得到了说明。

可用性

实现 GWASelect 的软件可在 http://www.bios.unc.edu/~lin. 上获得。WTCCC 数据的访问:http://www.wtccc.org.uk/。

相似文献

1
A variable selection method for genome-wide association studies.一种全基因组关联研究的变量选择方法。
Bioinformatics. 2011 Jan 1;27(1):1-8. doi: 10.1093/bioinformatics/btq600. Epub 2010 Oct 29.
2
Finding type 2 diabetes causal single nucleotide polymorphism combinations and functional modules from genome-wide association data.从全基因组关联数据中找到 2 型糖尿病因果单核苷酸多态性组合和功能模块。
BMC Med Inform Decis Mak. 2013;13 Suppl 1(Suppl 1):S3. doi: 10.1186/1472-6947-13-S1-S3. Epub 2013 Apr 5.
3
Genome-wide genetic analyses highlight mitogen-activated protein kinase (MAPK) signaling in the pathogenesis of endometriosis.全基因组遗传分析突出了丝裂原活化蛋白激酶(MAPK)信号通路在子宫内膜异位症发病机制中的作用。
Hum Reprod. 2017 Apr 1;32(4):780-793. doi: 10.1093/humrep/dex024.
4
Analyzing genome-wide association studies with an FDR controlling modification of the Bayesian Information Criterion.使用贝叶斯信息准则的错误发现率控制修正方法分析全基因组关联研究。
PLoS One. 2014 Jul 25;9(7):e103322. doi: 10.1371/journal.pone.0103322. eCollection 2014.
5
SNP-based pathway enrichment analysis for genome-wide association studies.基于 SNP 的通路富集分析在全基因组关联研究中的应用。
BMC Bioinformatics. 2011 Apr 15;12:99. doi: 10.1186/1471-2105-12-99.
6
Ultrahigh-dimensional variable selection method for whole-genome gene-gene interaction analysis.超高维全基因组基因-基因交互分析的变量选择方法。
BMC Bioinformatics. 2012 May 3;13:72. doi: 10.1186/1471-2105-13-72.
7
Fast and accurate imputation of summary statistics enhances evidence of functional enrichment.快速准确地推断汇总统计数据可增强功能富集的证据。
Bioinformatics. 2014 Oct 15;30(20):2906-14. doi: 10.1093/bioinformatics/btu416. Epub 2014 Jul 1.
8
Increasing power of genome-wide association studies by collecting additional single-nucleotide polymorphisms.通过收集额外的单核苷酸多态性来提高全基因组关联研究的效力。
Genetics. 2011 Jun;188(2):449-60. doi: 10.1534/genetics.111.128595. Epub 2011 Apr 5.
9
HAPGEN2: simulation of multiple disease SNPs.HAPGEN2:模拟多种疾病 SNP。
Bioinformatics. 2011 Aug 15;27(16):2304-5. doi: 10.1093/bioinformatics/btr341. Epub 2011 Jun 8.
10
Performance of epistasis detection methods in semi-simulated GWAS.连锁不平衡检测方法在半模拟 GWAS 中的性能。
BMC Bioinformatics. 2018 Jun 18;19(1):231. doi: 10.1186/s12859-018-2229-8.

引用本文的文献

1
Statistical and Machine Learning Analysis in Brain-Imaging Genetics: A Review of Methods.脑影像遗传学中的统计与机器学习分析:方法综述。
Behav Genet. 2024 May;54(3):233-251. doi: 10.1007/s10519-024-10177-y. Epub 2024 Feb 10.
2
Bayesian bi-level variable selection for genome-wide survival study.用于全基因组生存研究的贝叶斯双层变量选择
Genomics Inform. 2023 Sep;21(3):e28. doi: 10.5808/gi.23047. Epub 2023 Jun 28.
3
Identification of Driver Epistatic Gene Pairs Combining Germline and Somatic Mutations in Cancer.鉴定癌症中胚系和体细胞突变结合的驱动突变基因对。
Int J Mol Sci. 2023 May 26;24(11):9323. doi: 10.3390/ijms24119323.
4
Deep Learning Framework for Complex Disease Risk Prediction Using Genomic Variations.深度学习框架用于使用基因组变异预测复杂疾病风险。
Sensors (Basel). 2023 May 1;23(9):4439. doi: 10.3390/s23094439.
5
BGWAS: Bayesian variable selection in linear mixed models with nonlocal priors for genome-wide association studies.贝叶斯全局关联研究:用于全基因组关联研究的具有非局部先验的线性混合模型中的贝叶斯变量选择。
BMC Bioinformatics. 2023 May 11;24(1):194. doi: 10.1186/s12859-023-05316-x.
6
What predicts people's belief in COVID-19 misinformation? A retrospective study using a nationwide online survey among adults residing in the United States.什么因素可以预测人们对新冠病毒错误信息的信任?一项使用全美成年人在线调查的回顾性研究。
BMC Public Health. 2022 Nov 18;22(1):2114. doi: 10.1186/s12889-022-14431-y.
7
BICOSS: Bayesian iterative conditional stochastic search for GWAS.BICOSS:GWAS 的贝叶斯迭代条件随机搜索。
BMC Bioinformatics. 2022 Nov 12;23(1):475. doi: 10.1186/s12859-022-05030-0.
8
Potential application of elastic nets for shared polygenicity detection with adapted threshold selection.弹性网络在具有自适应阈值选择的共享多基因性检测中的潜在应用。
Int J Biostat. 2022 Nov 3;19(2):417-438. doi: 10.1515/ijb-2020-0108. eCollection 2023 Nov 1.
9
High-LD SNP markers exhibiting pleiotropic effects on salt tolerance at germination and seedlings stages in spring wheat.高 LD SNP 标记在春小麦萌发和幼苗期对耐盐性表现出多效性影响。
Plant Mol Biol. 2022 Apr;108(6):585-603. doi: 10.1007/s11103-022-01248-x. Epub 2022 Feb 25.
10
Debiased lasso for generalized linear models with a diverging number of covariates.带有发散协变量数量的广义线性模型的去偏套索。
Biometrics. 2023 Mar;79(1):344-357. doi: 10.1111/biom.13587. Epub 2021 Nov 15.

本文引用的文献

1
Ultrahigh dimensional feature selection: beyond the linear model.超高维特征选择:超越线性模型
J Mach Learn Res. 2009;10:2013-2038.
2
Regularization Paths for Generalized Linear Models via Coordinate Descent.基于坐标下降法的广义线性模型正则化路径
J Stat Softw. 2010;33(1):1-22.
3
A Markov blanket-based method for detecting causal SNPs in GWAS.基于马尔可夫毯的 GWAS 中因果 SNP 检测方法。
BMC Bioinformatics. 2010 Apr 29;11 Suppl 3(Suppl 3):S5. doi: 10.1186/1471-2105-11-S3-S5.
4
One-step Sparse Estimates in Nonconcave Penalized Likelihood Models.非凹惩罚似然模型中的一步稀疏估计
Ann Stat. 2008 Aug 1;36(4):1509-1533. doi: 10.1214/009053607000000802.
5
From disease association to risk assessment: an optimistic view from genome-wide association studies on type 1 diabetes.从疾病关联到风险评估:基于 1 型糖尿病全基因组关联研究的乐观观点。
PLoS Genet. 2009 Oct;5(10):e1000678. doi: 10.1371/journal.pgen.1000678. Epub 2009 Oct 9.
6
Isomer-specific effects of CLA on gene expression in human adipose tissue depending on PPARgamma2 P12A polymorphism: a double blind, randomized, controlled cross-over study.共轭亚油酸对人脂肪组织基因表达的异构体特异性影响取决于PPARγ2 P12A多态性:一项双盲、随机、对照交叉研究。
Lipids Health Dis. 2009 Aug 18;8:35. doi: 10.1186/1476-511X-8-35.
7
Discussion of "Sure Independence Screening for Ultra-High Dimensional Feature Space.《超高维特征空间中的确定独立性筛选》讨论
J R Stat Soc Series B Stat Methodol. 2008 Nov;70(5):903. doi: 10.1111/j.1467-9868.2008.00674.x.
8
Variable selection and dependency networks for genomewide data.全基因组数据的变量选择与依赖网络
Biostatistics. 2009 Oct;10(4):621-39. doi: 10.1093/biostatistics/kxp018. Epub 2009 Jun 11.
9
Genome-wide association analysis by lasso penalized logistic regression.基于套索惩罚逻辑回归的全基因组关联分析。
Bioinformatics. 2009 Mar 15;25(6):714-21. doi: 10.1093/bioinformatics/btp041. Epub 2009 Jan 28.
10
Genetic mapping in human disease.人类疾病中的基因定位
Science. 2008 Nov 7;322(5903):881-8. doi: 10.1126/science.1156409.