• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种可扩展的、基于知识的遗传关联研究分析框架。

A scalable, knowledge-based analysis framework for genetic association studies.

机构信息

Bioinformatics Research Group, Bina Nusantara University, Jakarta, Indonesia.

出版信息

BMC Bioinformatics. 2013 Oct 23;14:312. doi: 10.1186/1471-2105-14-312.

DOI:10.1186/1471-2105-14-312
PMID:24152222
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4015032/
Abstract

BACKGROUND

Testing for marginal associations between numerous genetic variants and disease may miss complex relationships among variables (e.g., gene-gene interactions). Bayesian approaches can model multiple variables together and offer advantages over conventional model building strategies, including using existing biological evidence as modeling priors and acknowledging that many models may fit the data well. With many candidate variables, Bayesian approaches to variable selection rely on algorithms to approximate the posterior distribution of models, such as Markov-Chain Monte Carlo (MCMC). Unfortunately, MCMC is difficult to parallelize and requires many iterations to adequately sample the posterior. We introduce a scalable algorithm called PEAK that improves the efficiency of MCMC by dividing a large set of variables into related groups using a rooted graph that resembles a mountain peak. Our algorithm takes advantage of parallel computing and existing biological databases when available.

RESULTS

By using graphs to manage a model space with more than 500,000 candidate variables, we were able to improve MCMC efficiency and uncover the true simulated causal variables, including a gene-gene interaction. We applied PEAK to a case-control study of childhood asthma with 2,521 genetic variants. We used an informative graph for oxidative stress derived from Gene Ontology and identified several variants in ERBB4, OXR1, and BCL2 with strong evidence for associations with childhood asthma.

CONCLUSIONS

We introduced an extremely flexible analysis framework capable of efficiently performing Bayesian variable selection on many candidate variables. The PEAK algorithm can be provided with an informative graph, which can be advantageous when considering gene-gene interactions, or a symmetric graph, which simply divides the model space into manageable regions. The PEAK framework is compatible with various model forms, allowing for the algorithm to be configured for different study designs and applications, such as pathway or rare-variant analyses, by simple modifications to the model likelihood and proposal functions.

摘要

背景

对众多遗传变异与疾病之间的边缘关联进行检测可能会忽略变量之间的复杂关系(例如,基因-基因相互作用)。贝叶斯方法可以一起对多个变量进行建模,并提供优于传统建模策略的优势,包括将现有生物学证据用作建模先验知识,并承认许多模型可能很好地适用于数据。对于许多候选变量,贝叶斯变量选择方法依赖于算法来近似模型的后验分布,例如马尔可夫链蒙特卡罗(MCMC)。不幸的是,MCMC 难以并行化,并且需要多次迭代才能充分采样后验。我们引入了一种可扩展的算法,称为 PEAK,该算法通过使用类似于山顶的有根图将一大组变量划分为相关组,从而提高了 MCMC 的效率。我们的算法在可用时利用了并行计算和现有的生物学数据库。

结果

通过使用图来管理具有超过 500,000 个候选变量的模型空间,我们能够提高 MCMC 的效率,并发现真正的模拟因果变量,包括基因-基因相互作用。我们将 PEAK 应用于一项涉及 2521 个遗传变异的儿童哮喘病例对照研究。我们使用了来自基因本体论的氧化应激信息图,并鉴定出 ERBB4、OXR1 和 BCL2 中的几个变体与儿童哮喘有很强的关联证据。

结论

我们引入了一种极其灵活的分析框架,能够在许多候选变量上高效执行贝叶斯变量选择。PEAK 算法可以提供一个信息图,当考虑基因-基因相互作用时,这可能是有利的,或者提供一个对称图,它只是将模型空间划分为可管理的区域。PEAK 框架与各种模型形式兼容,允许通过对模型似然和提议函数进行简单修改,将算法配置为不同的研究设计和应用,例如途径或罕见变异分析。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5d3f/4015032/969b6777acdf/1471-2105-14-312-7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5d3f/4015032/6465764e6cd0/1471-2105-14-312-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5d3f/4015032/05d6f08df5ce/1471-2105-14-312-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5d3f/4015032/e79f058f777f/1471-2105-14-312-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5d3f/4015032/1b242a611f0f/1471-2105-14-312-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5d3f/4015032/57e106780852/1471-2105-14-312-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5d3f/4015032/6dda8455db16/1471-2105-14-312-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5d3f/4015032/969b6777acdf/1471-2105-14-312-7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5d3f/4015032/6465764e6cd0/1471-2105-14-312-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5d3f/4015032/05d6f08df5ce/1471-2105-14-312-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5d3f/4015032/e79f058f777f/1471-2105-14-312-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5d3f/4015032/1b242a611f0f/1471-2105-14-312-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5d3f/4015032/57e106780852/1471-2105-14-312-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5d3f/4015032/6dda8455db16/1471-2105-14-312-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5d3f/4015032/969b6777acdf/1471-2105-14-312-7.jpg

相似文献

1
A scalable, knowledge-based analysis framework for genetic association studies.一种可扩展的、基于知识的遗传关联研究分析框架。
BMC Bioinformatics. 2013 Oct 23;14:312. doi: 10.1186/1471-2105-14-312.
2
Genetic studies of complex human diseases: characterizing SNP-disease associations using Bayesian networks.复杂人类疾病的遗传学研究:使用贝叶斯网络表征单核苷酸多态性与疾病的关联
BMC Syst Biol. 2012;6 Suppl 3(Suppl 3):S14. doi: 10.1186/1752-0509-6-S3-S14. Epub 2012 Dec 17.
3
Highly scalable maximum likelihood and conjugate Bayesian inference for ERGMs on graph sets with equivalent vertices.具有等价顶点的图集合上 ERGM 的高可扩展性最大似然和共轭贝叶斯推断。
PLoS One. 2022 Aug 26;17(8):e0273039. doi: 10.1371/journal.pone.0273039. eCollection 2022.
4
Fast genomic prediction of breeding values using parallel Markov chain Monte Carlo with convergence diagnosis.利用具有收敛诊断的并行马尔可夫链蒙特卡罗方法快速预测育种值。
BMC Bioinformatics. 2018 Jan 3;19(1):3. doi: 10.1186/s12859-017-2003-3.
5
A fast algorithm for Bayesian multi-locus model in genome-wide association studies.全基因组关联研究中贝叶斯多位点模型的快速算法。
Mol Genet Genomics. 2017 Aug;292(4):923-934. doi: 10.1007/s00438-017-1322-4. Epub 2017 May 22.
6
Parallel Markov chain Monte Carlo - bridging the gap to high-performance Bayesian computation in animal breeding and genetics.平行马尔可夫链蒙特卡罗 - 弥合动物育种和遗传学中高性能贝叶斯计算的差距。
Genet Sel Evol. 2012 Sep 25;44(1):29. doi: 10.1186/1297-9686-44-29.
7
Discovery of complex pathways from observational data.从观测数据中发现复杂的通路。
Stat Med. 2010 Aug 30;29(19):1998-2011. doi: 10.1002/sim.3962.
8
On the inference of complex phylogenetic networks by Markov Chain Monte-Carlo.基于马尔可夫链蒙特卡罗方法对复杂系统发育网络的推断
PLoS Comput Biol. 2021 Sep 3;17(9):e1008380. doi: 10.1371/journal.pcbi.1008380. eCollection 2021 Sep.
9
Bayesian coestimation of phylogeny and sequence alignment.系统发育与序列比对的贝叶斯联合估计
BMC Bioinformatics. 2005 Apr 1;6:83. doi: 10.1186/1471-2105-6-83.
10
Fast Bayesian whole-brain fMRI analysis with spatial 3D priors.具有空间3D先验的快速贝叶斯全脑功能磁共振成像分析。
Neuroimage. 2017 Feb 1;146:211-225. doi: 10.1016/j.neuroimage.2016.11.040. Epub 2016 Nov 19.

引用本文的文献

1
Integrated machine learning and bioinformatic analysis of mitochondrial-related signature in chronic rhinosinusitis with nasal polyps.慢性鼻-鼻窦炎伴鼻息肉中线粒体相关特征的综合机器学习与生物信息学分析
World Allergy Organ J. 2024 Sep 19;17(10):100964. doi: 10.1016/j.waojou.2024.100964. eCollection 2024 Oct.
2
Airway Epithelial Dysfunction in Asthma: Relevant to Epidermal Growth Factor Receptors and Airway Epithelial Cells.哮喘中的气道上皮功能障碍:与表皮生长因子受体和气道上皮细胞相关
J Clin Med. 2020 Nov 18;9(11):3698. doi: 10.3390/jcm9113698.
3
Metabolic Pathway Analysis and Effectiveness of Tamoxifen in Danish Breast Cancer Patients.

本文引用的文献

1
Incorporating model uncertainty in detecting rare variants: the Bayesian risk index.在检测罕见变异中纳入模型不确定性:贝叶斯风险指数。
Genet Epidemiol. 2011 Nov;35(7):638-49. doi: 10.1002/gepi.20613. Epub 2011 Aug 26.
2
Use of pathway information in molecular epidemiology.在分子流行病学中使用途径信息。
Hum Genomics. 2009 Oct;4(1):21-42. doi: 10.1186/1479-7364-4-1-21.
3
MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes.MaCH:利用序列和基因型数据来估计单倍型和未观测基因型。
丹麦乳腺癌患者中他莫昔芬的代谢途径分析及其有效性
Cancer Epidemiol Biomarkers Prev. 2020 Mar;29(3):582-590. doi: 10.1158/1055-9965.EPI-19-0833. Epub 2020 Jan 13.
4
A two-phase Bayesian methodology for the analysis of binary phenotypes in genome-wide association studies.一种用于全基因组关联研究中二元表型分析的两阶段贝叶斯方法。
Biom J. 2020 Jan;62(1):191-201. doi: 10.1002/bimj.201900050. Epub 2019 Sep 4.
5
Assessment of genetic factor and depression interactions for asthma symptom severity in cohorts of childhood and elderly asthmatics.评估遗传因素和抑郁在儿童和老年哮喘队列中对哮喘症状严重程度的交互作用。
Exp Mol Med. 2018 Jul 4;50(7):1-7. doi: 10.1038/s12276-018-0110-5.
6
Current Challenges and New Opportunities for Gene-Environment Interaction Studies of Complex Diseases.复杂疾病基因-环境相互作用研究的当前挑战与新机遇
Am J Epidemiol. 2017 Oct 1;186(7):753-761. doi: 10.1093/aje/kwx227.
7
The soft computing-based approach to investigate allergic diseases: a systematic review.基于软计算的过敏性疾病研究方法:一项系统综述。
Clin Mol Allergy. 2017 Apr 13;15:10. doi: 10.1186/s12948-017-0066-3. eCollection 2017.
8
Missing heritability of common diseases and treatments outside the protein-coding exome.常见疾病及蛋白质编码外显子组以外治疗方法的遗传力缺失
Hum Genet. 2014 Oct;133(10):1199-215. doi: 10.1007/s00439-014-1476-7. Epub 2014 Aug 9.
Genet Epidemiol. 2010 Dec;34(8):816-34. doi: 10.1002/gepi.20533.
4
Discovery of complex pathways from observational data.从观测数据中发现复杂的通路。
Stat Med. 2010 Aug 30;29(19):1998-2011. doi: 10.1002/sim.3962.
5
Finding the missing heritability of complex diseases.寻找复杂疾病中缺失的遗传力。
Nature. 2009 Oct 8;461(7265):747-53. doi: 10.1038/nature08494.
6
Approaches to complex pathways in molecular epidemiology: summary of a special conference of the American Association for Cancer Research.分子流行病学中复杂途径的研究方法:美国癌症研究协会特别会议综述
Cancer Res. 2008 Dec 15;68(24):10028-30. doi: 10.1158/0008-5472.CAN-08-1690.
7
Particulate air pollutants and asthma. A paradigm for the role of oxidative stress in PM-induced adverse health effects.空气中的颗粒物污染物与哮喘。氧化应激在颗粒物诱导的不良健康影响中作用的范例。
Clin Immunol. 2003 Dec;109(3):250-65. doi: 10.1016/j.clim.2003.08.006.
8
Inference of population structure using multilocus genotype data.利用多位点基因型数据推断群体结构。
Genetics. 2000 Jun;155(2):945-59. doi: 10.1093/genetics/155.2.945.
9
A theoretical basis for investigating ambient air pollution and children's respiratory health.研究环境空气污染与儿童呼吸健康的理论基础。
Environ Health Perspect. 1999 Jun;107 Suppl 3(Suppl 3):403-7. doi: 10.1289/ehp.99107s3403.
10
A study of twelve Southern California communities with differing levels and types of air pollution. I. Prevalence of respiratory morbidity.一项针对南加州12个空气污染水平和类型各异的社区的研究。一、呼吸道疾病的患病率。
Am J Respir Crit Care Med. 1999 Mar;159(3):760-7. doi: 10.1164/ajrccm.159.3.9804143.