• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

将多重假设检验与机器学习相结合可提高全基因组关联研究的统计效力。

Combining Multiple Hypothesis Testing with Machine Learning Increases the Statistical Power of Genome-wide Association Studies.

作者信息

Mieth Bettina, Kloft Marius, Rodríguez Juan Antonio, Sonnenburg Sören, Vobruba Robin, Morcillo-Suárez Carlos, Farré Xavier, Marigorta Urko M, Fehr Ernst, Dickhaus Thorsten, Blanchard Gilles, Schunk Daniel, Navarro Arcadi, Müller Klaus-Robert

机构信息

Machine Learning Group, Technische Universität Berlin, Berlin, 10587, Germany.

Department of Computer Science, Humboldt University of Berlin, Berlin, 10099, Germany.

出版信息

Sci Rep. 2016 Nov 28;6:36671. doi: 10.1038/srep36671.

DOI:10.1038/srep36671
PMID:27892471
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5125008/
Abstract

The standard approach to the analysis of genome-wide association studies (GWAS) is based on testing each position in the genome individually for statistical significance of its association with the phenotype under investigation. To improve the analysis of GWAS, we propose a combination of machine learning and statistical testing that takes correlation structures within the set of SNPs under investigation in a mathematically well-controlled manner into account. The novel two-step algorithm, COMBI, first trains a support vector machine to determine a subset of candidate SNPs and then performs hypothesis tests for these SNPs together with an adequate threshold correction. Applying COMBI to data from a WTCCC study (2007) and measuring performance as replication by independent GWAS published within the 2008-2015 period, we show that our method outperforms ordinary raw p-value thresholding as well as other state-of-the-art methods. COMBI presents higher power and precision than the examined alternatives while yielding fewer false (i.e. non-replicated) and more true (i.e. replicated) discoveries when its results are validated on later GWAS studies. More than 80% of the discoveries made by COMBI upon WTCCC data have been validated by independent studies. Implementations of the COMBI method are available as a part of the GWASpi toolbox 2.0.

摘要

全基因组关联研究(GWAS)的标准分析方法是对基因组中的每个位置单独进行测试,以确定其与所研究表型之间关联的统计学显著性。为了改进GWAS分析,我们提出了一种机器学习与统计检验相结合的方法,该方法以数学上严格可控的方式考虑了所研究的单核苷酸多态性(SNP)集合中的相关结构。这种新颖的两步算法COMBI,首先训练一个支持向量机来确定候选SNP的子集,然后对这些SNP进行假设检验,并进行适当的阈值校正。将COMBI应用于WTCCC研究(2007年)的数据,并以2008 - 2015年期间发表的独立GWAS的复制情况来衡量性能,我们发现我们的方法优于普通的原始p值阈值法以及其他现有技术方法。当在后续的GWAS研究中验证其结果时,COMBI比所检验的其他方法具有更高的功效和精度,同时产生更少的错误(即未复制的)发现和更多真实(即已复制的)发现。COMBI对WTCCC数据所做的发现中,超过80%已被独立研究验证。COMBI方法的实现可作为GWASpi工具箱2.0的一部分获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9527/5125008/6ccb2364d603/srep36671-f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9527/5125008/e3c8d2f852a6/srep36671-f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9527/5125008/0e14b9a13ae1/srep36671-f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9527/5125008/56592fa1e603/srep36671-f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9527/5125008/6ccb2364d603/srep36671-f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9527/5125008/e3c8d2f852a6/srep36671-f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9527/5125008/0e14b9a13ae1/srep36671-f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9527/5125008/56592fa1e603/srep36671-f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9527/5125008/6ccb2364d603/srep36671-f4.jpg

相似文献

1
Combining Multiple Hypothesis Testing with Machine Learning Increases the Statistical Power of Genome-wide Association Studies.将多重假设检验与机器学习相结合可提高全基因组关联研究的统计效力。
Sci Rep. 2016 Nov 28;6:36671. doi: 10.1038/srep36671.
2
Assessing statistical significance in multivariable genome wide association analysis.评估多变量全基因组关联分析中的统计学显著性。
Bioinformatics. 2016 Jul 1;32(13):1990-2000. doi: 10.1093/bioinformatics/btw128. Epub 2016 Mar 7.
3
SNP-based pathway enrichment analysis for genome-wide association studies.基于 SNP 的通路富集分析在全基因组关联研究中的应用。
BMC Bioinformatics. 2011 Apr 15;12:99. doi: 10.1186/1471-2105-12-99.
4
Utilizing Deep Learning and Genome Wide Association Studies for Epistatic-Driven Preterm Birth Classification in African-American Women.利用深度学习和全基因组关联研究对非裔美国妇女的由上位效应驱动的早产进行分类。
IEEE/ACM Trans Comput Biol Bioinform. 2020 Mar-Apr;17(2):668-678. doi: 10.1109/TCBB.2018.2868667. Epub 2018 Sep 3.
5
Leveraging existing GWAS summary data of genetically correlated and uncorrelated traits to improve power for a new GWAS.利用与新 GWAS 具有遗传相关性和非相关性的现有 GWAS 汇总数据来提高其效力。
Genet Epidemiol. 2020 Oct;44(7):717-732. doi: 10.1002/gepi.22333. Epub 2020 Jul 16.
6
DeepCOMBI: explainable artificial intelligence for the analysis and discovery in genome-wide association studies.DeepCOMBI:用于全基因组关联研究分析与发现的可解释人工智能。
NAR Genom Bioinform. 2021 Jul 20;3(3):lqab065. doi: 10.1093/nargab/lqab065. eCollection 2021 Sep.
7
r2VIM: A new variable selection method for random forests in genome-wide association studies.r2VIM:全基因组关联研究中随机森林的一种新变量选择方法。
BioData Min. 2016 Feb 1;9:7. doi: 10.1186/s13040-016-0087-3. eCollection 2016.
8
Genome-wide association data classification and SNPs selection using two-stage quality-based Random Forests.使用基于质量的两阶段随机森林进行全基因组关联数据分类和单核苷酸多态性选择。
BMC Genomics. 2015;16 Suppl 2(Suppl 2):S5. doi: 10.1186/1471-2164-16-S2-S5. Epub 2015 Jan 21.
9
Accuracy of genome-wide imputation of untyped markers and impacts on statistical power for association studies.未分型标记的全基因组推断准确性及其对关联研究统计效能的影响。
BMC Genet. 2009 Jun 16;10:27. doi: 10.1186/1471-2156-10-27.
10
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

引用本文的文献

1
Data simulation to optimize frameworks for genome-wide association studies in diverse populations.数据模拟以优化不同人群全基因组关联研究的框架。
Front Genet. 2025 Jun 18;16:1559496. doi: 10.3389/fgene.2025.1559496. eCollection 2025.
2
Machine learning-based screening of asthma biomarkers and related immune infiltration.基于机器学习的哮喘生物标志物筛选及相关免疫浸润分析
Front Allergy. 2025 Jan 29;6:1506608. doi: 10.3389/falgy.2025.1506608. eCollection 2025.
3
GWAS for identification of genomic regions and candidate genes in vegetable crops.

本文引用的文献

1
A FAST ALGORITHM FOR DETECTING GENE-GENE INTERACTIONS IN GENOME-WIDE ASSOCIATION STUDIES.一种在全基因组关联研究中检测基因-基因相互作用的快速算法。
Ann Appl Stat. 2014;8(4):2292-2318. doi: 10.1214/14-aoas771.
2
Risk Classification with an Adaptive Naive Bayes Kernel Machine Model.基于自适应朴素贝叶斯核机器模型的风险分类
J Am Stat Assoc. 2015 Apr 22;110(509):393-404. doi: 10.1080/01621459.2014.908778.
3
Posterior predictive checks to quantify lack-of-fit in admixture models of latent population structure.用于量化潜在群体结构混合模型中拟合不足的后验预测检验。
GWAS 用于鉴定蔬菜作物中的基因组区域和候选基因。
Funct Integr Genomics. 2024 Oct 29;24(6):203. doi: 10.1007/s10142-024-01477-x.
4
Multi-trait modeling and machine learning discover new markers associated with stem traits in alfalfa.多性状建模与机器学习发现与苜蓿茎性状相关的新标记。
Front Plant Sci. 2024 Sep 9;15:1429976. doi: 10.3389/fpls.2024.1429976. eCollection 2024.
5
Modeling Chickpea Productivity with Artificial Image Objects and Convolutional Neural Network.利用人工图像对象和卷积神经网络对鹰嘴豆生产力进行建模
Plants (Basel). 2024 Sep 1;13(17):2444. doi: 10.3390/plants13172444.
6
Reviewing the essential roles of remote phenotyping, GWAS and explainable AI in practical marker-assisted selection for drought-tolerant winter wheat breeding.回顾远程表型分析、全基因组关联研究(GWAS)以及可解释人工智能在耐旱冬小麦育种实际标记辅助选择中的重要作用。
Front Plant Sci. 2024 Apr 18;15:1319938. doi: 10.3389/fpls.2024.1319938. eCollection 2024.
7
Exploiting integrative metabolomics to study host-parasite interactions in Plasmodium infections.利用整合代谢组学研究疟原虫感染中的宿主-寄生虫相互作用。
Trends Parasitol. 2024 Apr;40(4):313-323. doi: 10.1016/j.pt.2024.02.007. Epub 2024 Mar 19.
8
Machine Learning to Advance Human Genome-Wide Association Studies.机器学习在全基因组关联研究中的应用
Genes (Basel). 2023 Dec 25;15(1):34. doi: 10.3390/genes15010034.
9
Discovering SNP-disease relationships in genome-wide SNP data using an improved harmony search based on SNP locus and genetic inheritance patterns.利用基于 SNP 位置和遗传遗传模式的改进和声搜索在全基因组 SNP 数据中发现 SNP 疾病关系。
PLoS One. 2023 Oct 13;18(10):e0292266. doi: 10.1371/journal.pone.0292266. eCollection 2023.
10
High-dimensional supervised classification in a context of non-independence of observations to identify the determining SNPs in a phenotype.在观测值非独立的情况下进行高维监督分类,以识别表型中的决定性单核苷酸多态性。
Infect Dis Model. 2023 Sep 9;8(4):1079-1087. doi: 10.1016/j.idm.2023.09.002. eCollection 2023 Dec.
Proc Natl Acad Sci U S A. 2015 Jun 30;112(26):E3441-50. doi: 10.1073/pnas.1412301112. Epub 2015 Jun 12.
4
Testing for genetic associations in arbitrarily structured populations.在任意结构群体中进行基因关联检测。
Nat Genet. 2015 May;47(5):550-4. doi: 10.1038/ng.3244. Epub 2015 Mar 30.
5
Genome-wide association data classification and SNPs selection using two-stage quality-based Random Forests.使用基于质量的两阶段随机森林进行全基因组关联数据分类和单核苷酸多态性选择。
BMC Genomics. 2015;16 Suppl 2(Suppl 2):S5. doi: 10.1186/1471-2164-16-S2-S5. Epub 2015 Jan 21.
6
Efficient Bayesian mixed-model analysis increases association power in large cohorts.高效的贝叶斯混合模型分析提高了大型队列研究中的关联效能。
Nat Genet. 2015 Mar;47(3):284-90. doi: 10.1038/ng.3190. Epub 2015 Feb 2.
7
Bayesian feature selection for high-dimensional linear regression via the Ising approximation with applications to genomics.基于伊辛近似的高维线性回归的贝叶斯特征选择及其在基因组学中的应用。
Bioinformatics. 2015 Jun 1;31(11):1754-61. doi: 10.1093/bioinformatics/btv037. Epub 2015 Jan 24.
8
Regularized machine learning in the genetic prediction of complex traits.复杂性状遗传预测中的正则化机器学习
PLoS Genet. 2014 Nov 13;10(11):e1004754. doi: 10.1371/journal.pgen.1004754. eCollection 2014 Nov.
9
Variable selection in Bayesian generalized linear-mixed models: an illustration using candidate gene case-control association studies.贝叶斯广义线性混合模型中的变量选择:以候选基因病例对照关联研究为例
Biom J. 2015 Mar;57(2):234-53. doi: 10.1002/bimj.201300259. Epub 2014 Sep 30.
10
Efficient multivariate linear mixed model algorithms for genome-wide association studies.高效的全基因组关联研究的多元线性混合模型算法。
Nat Methods. 2014 Apr;11(4):407-9. doi: 10.1038/nmeth.2848. Epub 2014 Feb 16.