通过双向图中的最短路径检测基因关联。

Detecting genetic association through shortest paths in a bidirected graph.

作者信息

Ueki Masao, Kawasaki Yoshinori, Tamiya Gen

机构信息

Biostatistics Center, Kurume University, Fukuoka, Japan.

Department of Statistical Modeling, The Institute of Statistical Mathematics, The Graduate University for Advanced Studies, Tachikawa, Tokyo, Japan.

出版信息

Genet Epidemiol. 2017 Sep;41(6):481-497. doi: 10.1002/gepi.22051. Epub 2017 Jun 19.

DOI:10.1002/gepi.22051

PMID:28626864

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5849262/

Abstract

Genome-wide association studies (GWASs) commonly use marginal association tests for each single-nucleotide polymorphism (SNP). Because these tests treat SNPs as independent, their power will be suboptimal for detecting SNPs hidden by linkage disequilibrium (LD). One way to improve power is to use a multiple regression model. However, the large number of SNPs preclude simultaneous fitting with multiple regression, and subset regression is infeasible because of an exorbitant number of candidate subsets. We therefore propose a new method for detecting hidden SNPs having significant yet weak marginal association in a multiple regression model. Our method begins by constructing a bidirected graph locally around each SNP that demonstrates a moderately sized marginal association signal, the focal SNPs. Vertexes correspond to SNPs, and adjacency between vertexes is defined by an LD measure. Subsequently, the method collects from each graph all shortest paths to the focal SNP. Finally, for each shortest path the method fits a multiple regression model to all the SNPs lying in the path and tests the significance of the regression coefficient corresponding to the terminal SNP in the path. Simulation studies show that the proposed method can detect susceptibility SNPs hidden by LD that go undetected with marginal association testing or with existing multivariate methods. When applied to real GWAS data from the Alzheimer's Disease Neuroimaging Initiative (ADNI), our method detected two groups of SNPs: one in a region containing the apolipoprotein E (APOE) gene, and another in a region close to the semaphorin 5A (SEMA5A) gene.

摘要

全基因组关联研究（GWAS）通常对每个单核苷酸多态性（SNP）使用边际关联检验。由于这些检验将SNP视为独立的，因此在检测因连锁不平衡（LD）而隐藏的SNP时，其功效将不理想。提高功效的一种方法是使用多元回归模型。然而，大量的SNP使得无法同时用多元回归进行拟合，并且由于候选子集数量过多，子集回归也不可行。因此，我们提出了一种新方法，用于在多元回归模型中检测具有显著但微弱边际关联的隐藏SNP。我们的方法首先在每个显示中等大小边际关联信号的SNP（即焦点SNP）周围局部构建一个双向图。顶点对应于SNP，顶点之间的邻接关系由LD度量定义。随后，该方法从每个图中收集到焦点SNP的所有最短路径。最后，对于每条最短路径，该方法对路径中所有的SNP拟合一个多元回归模型，并检验与路径中终端SNP对应的回归系数的显著性。模拟研究表明，所提出的方法可以检测到因LD而隐藏的、边际关联检验或现有多变量方法未检测到的易感SNP。当应用于来自阿尔茨海默病神经影像学倡议（ADNI）的真实GWAS数据时，我们的方法检测到两组SNP：一组在包含载脂蛋白E（APOE）基因的区域，另一组在靠近信号素5A（SEMA5A）基因的区域。

相似文献

Detecting genetic association through shortest paths in a bidirected graph.通过双向图中的最短路径检测基因关联。

Genet Epidemiol. 2017 Sep;41(6):481-497. doi: 10.1002/gepi.22051. Epub 2017 Jun 19.

A hidden Markov random field model for genome-wide association studies.基于隐马尔可夫随机场模型的全基因组关联研究。

Biostatistics. 2010 Jan;11(1):139-50. doi: 10.1093/biostatistics/kxp043. Epub 2009 Oct 12.

Powerful and Adaptive Testing for Multi-trait and Multi-SNP Associations with GWAS and Sequencing Data.利用全基因组关联研究（GWAS）和测序数据对多性状和多单核苷酸多态性（SNP）关联进行强大且自适应的检测。

Genetics. 2016 Jun;203(2):715-31. doi: 10.1534/genetics.115.186502. Epub 2016 Apr 13.

Smooth-Threshold Multivariate Genetic Prediction with Unbiased Model Selection.具有无偏模型选择的平滑阈值多变量遗传预测

Genet Epidemiol. 2016 Apr;40(3):233-43. doi: 10.1002/gepi.21958. Epub 2016 Mar 6.

Fast score test with global null estimation regardless of missing genotypes.无论基因型缺失如何，都可以进行快速得分检验和全局零假设估计。

PLoS One. 2018 Jul 5;13(7):e0199692. doi: 10.1371/journal.pone.0199692. eCollection 2018.

Adaptive testing for multiple traits in a proportional odds model with applications to detect SNP-brain network associations.比例优势模型中多性状的自适应检验及其在检测单核苷酸多态性与脑网络关联中的应用

Genet Epidemiol. 2017 Apr;41(3):259-277. doi: 10.1002/gepi.22033. Epub 2017 Feb 13.

A method combining a random forest-based technique with the modeling of linkage disequilibrium through latent variables, to run multilocus genome-wide association studies.一种结合基于随机森林的技术和通过潜在变量进行连锁不平衡建模的方法，用于进行多基因座全基因组关联研究。

BMC Bioinformatics. 2018 Mar 27;19(1):106. doi: 10.1186/s12859-018-2054-0.

Detecting Gene-Environment Interactions for a Quantitative Trait in a Genome-Wide Association Study.在全基因组关联研究中检测数量性状的基因-环境相互作用

Genet Epidemiol. 2016 Jul;40(5):394-403. doi: 10.1002/gepi.21977. Epub 2016 May 27.

Comparison of multimarker logistic regression models, with application to a genomewide scan of schizophrenia.多标志物逻辑回归模型的比较及其在精神分裂症全基因组扫描中的应用。

BMC Genet. 2010 Sep 9;11:80. doi: 10.1186/1471-2156-11-80.

Identifying Candidate Genetic Associations with MRI-Derived AD-Related ROI via Tree-Guided Sparse Learning.通过树引导稀疏学习识别与 MRI 衍生的 AD 相关 ROI 相关的候选遗传关联。

IEEE/ACM Trans Comput Biol Bioinform. 2019 Nov-Dec;16(6):1986-1996. doi: 10.1109/TCBB.2018.2833487. Epub 2018 May 7.

引用本文的文献

Autism Spectrum Disorder- and/or Intellectual Disability-Associated Semaphorin-5A Exploits the Mechanism by Which Dock5 Signalosome Molecules Control Cell Shape.与自闭症谱系障碍和/或智力残疾相关的信号素5A利用了Dock5信号体分子控制细胞形状的机制。

Curr Issues Mol Biol. 2024 Apr 2;46(4):3092-3107. doi: 10.3390/cimb46040194.

Applications and Challenges of Machine Learning Methods in Alzheimer's Disease Multi-Source Data Analysis.机器学习方法在阿尔茨海默病多源数据分析中的应用与挑战

Curr Genomics. 2021 Dec 31;22(8):564-582. doi: 10.2174/1389202923666211216163049.

HLA-C: An Accomplice in Rheumatic Diseases.人类白细胞抗原C：风湿性疾病中的帮凶

ACR Open Rheumatol. 2019 Sep 6;1(9):571-579. doi: 10.1002/acr2.11065. eCollection 2019 Nov.

Analysis of pleiotropic genetic effects on cognitive impairment, systemic inflammation, and plasma lipids in the Health and Retirement Study.对健康与退休研究中认知障碍、全身炎症和血浆脂质的多效遗传效应分析。

Neurobiol Aging. 2019 Aug;80:173-186. doi: 10.1016/j.neurobiolaging.2018.10.028. Epub 2019 Mar 6.

本文引用的文献

Imputation without doing imputation: a new method for the detection of non-genotyped causal variants.无需推断的推断：一种新的检测未基因型因果变异的方法。

Genet Epidemiol. 2014 Apr;38(3):173-90. doi: 10.1002/gepi.21792. Epub 2014 Feb 17.

Genetic analysis of quantitative phenotypes in AD and MCI: imaging, cognition and biomarkers.AD 和 MCI 中定量表型的遗传分析：影像学、认知和生物标志物。

Brain Imaging Behav. 2014 Jun;8(2):183-207. doi: 10.1007/s11682-013-9262-z.

Bringing genome-wide association findings into clinical use.将全基因组关联研究结果应用于临床实践。

Nat Rev Genet. 2013 Aug;14(8):549-58. doi: 10.1038/nrg3523. Epub 2013 Jul 9.

PUMA: a unified framework for penalized multiple regression analysis of GWAS data.PUMA：用于 GWAS 数据分析的惩罚性多重回归分析的统一框架。

PLoS Comput Biol. 2013;9(6):e1003101. doi: 10.1371/journal.pcbi.1003101. Epub 2013 Jun 27.

A multi-SNP locus-association method reveals a substantial fraction of the missing heritability.多基因座关联方法揭示了相当一部分的遗传缺失。

Am J Hum Genet. 2012 Nov 2;91(5):863-71. doi: 10.1016/j.ajhg.2012.09.013.

Power of single- vs. multi-marker tests of association.单标志物与多标志物关联检验的效能。

Genet Epidemiol. 2012 Jul;36(5):480-7. doi: 10.1002/gepi.21642. Epub 2012 May 30.

HAPGEN2: simulation of multiple disease SNPs.HAPGEN2：模拟多种疾病 SNP。

Bioinformatics. 2011 Aug 15;27(16):2304-5. doi: 10.1093/bioinformatics/btr341. Epub 2011 Jun 8.

Two-marker association tests yield new disease associations for coronary artery disease and hypertension.双标记物关联分析为冠状动脉疾病和高血压提供了新的疾病关联。

Hum Genet. 2011 Dec;130(6):725-33. doi: 10.1007/s00439-011-1009-6. Epub 2011 May 28.

A flexible model for association analysis in sibships with missing genotype data.一种用于在存在缺失基因型数据的同胞关系中进行关联分析的灵活模型。

Ann Hum Genet. 2011 May;75(3):428-38. doi: 10.1111/j.1469-1809.2010.00636.x. Epub 2011 Jan 17.

Regularization Paths for Generalized Linear Models via Coordinate Descent.基于坐标下降法的广义线性模型正则化路径

J Stat Softw. 2010;33(1):1-22.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验